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Abstract 

We provide a reformulation of finite dimensional quantum theory in the 
circuit framework in terms of mathematical axioms, and a reconstruction 
of quantum theory from operational postulates. 

We consider operations, A a 2 (where ai is the input and b2 is the out- 
put). We also consider operators, A° 2 X £ V ai CS> V bl where V ai (V b2 ) is the 
space of Hermitian operators on the complex Hilbert space H ai (H b2 ). 
We say operations correspond to operators if the probability for a cir- 
cuit is given by replacing operations with operators then taking the trace. 
For example, Prob(A a2 B^ 2 Cb 2 ) = A 32 B b a 2 C b2 (t race is implicit for repeated 
labels). The mathematical axioms for quantum theory are the following 

Axiom 1 Operations correspond to operators. 

Axiom 2 Every complete set of physical operators corresponds to a com- 
plete set of operations. 

Physical operators have the property that they are positive after taking 
the partial transpose over the input space. 

We show that these mathematical axioms are equivalent to a set of 
postulates couched in operational terms. A maximal set of distinguishable 
states is any set containing the maximum number of states for which 
there exists some measurement, called a maximal measurement, which 
can identify which state from the set we have in a single shot. A maximal 
effect is associated with each result of a maximal measurement. States 
are represented by vectors whose entries are probabilities. A set of states 
is said to be non-flat if it is a spanning subset of the full set of states that 
give rise only to some subset of outcomes of some maximal measurement. 
We show that classical probability theory and quantum theory are the 
only two theories consistent with the following set of postulates. 

PI Sharpness. Associated with any given pure state is a unique maximal 
effect giving probability equal to one. This maximal effect does not 
give probability equal to one for any other pure state. 

P2 Information locality. A maximal measurement on a composite sys- 
tem is effected if we perform maximal measurements on each of the 
components. 

P3 Tomographic locality. The state of a composite system can be deter- 
mined from the statistics collected by making measurements on the 
components. 

P4' Permutability. There exists a reversible transformation on any sys- 
tem effecting any given permutation of any given maximal set of 
distinguishable states for that system. 

P5 Sturdiness. Filters are non-flattening. 

We single out quantum theory if we replace P4' by 

P4 Compound permutability. There exists a compound reversible trans- 
formation on any system effecting any given permutation of any 
given maximal set of distinguishable states for that system. 

A compound transformation is one that can be made from two sequential 
transformations (neither equal to the identity). 
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Prelude: non-flattening transformations 



In physics we need to get control of some part of the world when we do ex- 
periments. One device we often use for this purpose is a filter. A filter cuts 
away the part of the world we are not interested in leaving only that bit we 
wish to do experiments on. There are many filtering devices in typical quantum 
experiments. For example, we may have pinholes or slits that allow only parti- 
cles with a particular range of positions through. We may use frequency filters 
which allow only particles with a particular range of frequencies through. We 
have velocity selectors that narrow down the range of velocities passing through. 
Filters are, effectively, what we use to define the system we are interested in. 

To define what we mean by a filter in operational terms we need a few 
basic notions. A maximal set of distinguishable states for a system is any set 
of states containing the maximum number of states for which there exists some 
measurement, called a maximal measurement, which can identify which state 
from the set we have in a single shot. In quantum theory a maximal set of 
distinguishable states would be those corresponding to any orthonormal basis 
of the Hilbert space. A maximal measurement corresponds to one that measures 
a non-degenerate operator (i.e. a projection valued measure consisting only of 
rank one projectors). 

It is interesting to consider restricting to those states which only give rise to 
a subset of outcomes of a maximal measurement. We define 

An informational subset of states is the full set of states which 
only give rise to some given subset of outcomes of a given maximal 
measurement (and give probability zero for the other outcomes). 

Systems whose state is restricted to belong to a given informational subset of 
states have their information carrying capacity constrained. 

In general operational theories, states can be represented by vectors whose 
entries are probabilities or by objects that are given by a linear map acting on 
such vectors of probabilities. In quantum theory, states can be represented by 
density operators which are, indeed, linearly related to probabilities. We wish 
to define an important kind of set of states. 

Non-flat sets of states. A set of states is non-flat if it is a spanning 
subset of some informational subset of states. 

Any set of states that is not non-flat is said to be flat. A non-flattening trans- 
formation is one that transforms any non-flat input set of states into a non-flat 
output set. 

We define a filter with respect to a particular subset of outcomes associated 
with a particular maximal measurement. 

A filter is a transformation that passes unchanged those states 
which would give rise only to the given subset of outcomes of the 
given maximal measurement and block states which would give rise 
only to the complement set of outcomes. 
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In quantum theory a filter corresponds to projecting the state onto a given 
subspace of the Hilbert space. For example, we may start out with a system 
associated with a Hilbert space of dimension 5. We can project onto some 
particular 3 dimensional subspace of this Hilbert space. The system after such 
a filtering operation is associated with the 3 dimensional Hilbert space. 

The fifth postulate of the reconstruction of quantum theory to be given in 
Part IIVI is the following. 

P5 Sturdiness. Filters are non-flattening. 

This basically says, in a certain sense, that filters do not destroy more infor- 
mation than necessary. If the informational subset associated with a particular 
non-flat set of states is the same as the informational subset associated with 
the given filter then the states will pass through unchanged and hence remain 
non-flat. But we can also have situations where the states in the non-flat set 
get partially absorbed by the filter. P5 implies that, even in this case, the out 
coming set of states will be non-flat. This implies that sets of states are, in a 
certain sense, sturdy against a fairly dramatic transformation. On the other 
hand, given that there is no need for filtering transformations to flatten sets 
of states, it is reasonable that such transformations be non-flattening. Filters 
are, indeed, non-flattening in both classical probability theory and in quantum 
theory (see Appendix |A"| . 

In this prelude we will illustrate how this postulate works in the context 
of quantum theory with examples. Consider, for example, a four dimensional 
Hilbert space, %4, with orthonormal basis \n) where n = 1 to 4. Define 

\mnx) = —={\m) + |n)) and \mny) — — j=(\m) + i\n)) (1) 
v 2 v 2 

Define 

p n = \n){n\, p m nx = \mnx)(mnx\, \mny)(mny\ (2) 
Now consider the following three sets of states for this space. 

Set A = {pi,p2,p4,pl2x,Pl2y,pl4x,Pl4,y,P24x,P2±y} (3) 
SetS = {pi,P2,P4,Pl2x,P24y,PUy} (4) 
Set C = {pi,p2,Pl2x,Pl2y} (5) 

Sets A and B contain states only having support on the three dimensional 
Hilbert space spanned by {|1), |2), |4)}. Further, there does not exist a Hilbert 
space of smaller dimension which supports the states in either of these sets. In 
quantum theory the space of the positive operators acting on an N dimensional 
Hilbert space is of dimension ./V 2 (this is the number of real parameters required 
to specify a density matrix having support on an N dimensional Hilbert space). 
The states in set A span the space of operators acting on this three dimensional 
Hilbert space (there are 9 = 3 2 linearly independent states in A) and hence 
constitute a non-flat set. The same is not true of the states in set B and hence 
this set is flat. . Set C has support on the two dimensional Hilbert space 
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spanned by |1) and |2). There are 4 linearly independent states in this set and 
hence the set is non-flat. 

Now consider sending the states in set A through a filter that projects onto 
the two dimensional Hilbert space spanned by |1) and |2). The projection 
operator associated with this filter is 

F = |1)<1| + |2}<2| (6) 

By applying this projector to the states in set A we obtain 

Set A F = {pi,p2,0,pi2 X ,Pi2y, -pi, -pi, -p2, -pi] (7) 

These states have support on two dimensional Hilbert space spanned by |1) and 
1 2). Further, we see that the set of states is non-flat since these states span 
the space of operators acting on this two dimensional Hilbert space (there are 
4 = 2 2 linearly independent states amongst the states in Ap). Hence, when we 
send in the non-flat set of states A, we get out a non-flat set of states Ap. 

Consider sending the states in set C through a filter that projects on to 
the two dimensional Hilbert space spanned by the orthonormal vectors |1) and 
|23x). The projector associated with this filter is 

G = 1 1 > < 1 1 + |23x)(23a;| (8) 

The states in C become 

SetC G = {p 1 ,±p 23x ,\x , ){x'\,\y'){y , \} (9) 

where 

\x') = ±\l) + ~\23x) and \y') = -L|l> + ^\23x) (10) 

The states in set Cq are clearly non-flat also. We see that in this case, if we send 
the non-flat set of states, C, into this filter we get out a non-flat set of states. 
Interestingly, in this case, the states get closer to the |1)(1| state. The version 
of this fact that holds for general bits will play a role in the reconstruction of 
quantum theory from operational postulates. 

This are just a few examples. It turns out that, in quantum theory, if we 
send any non-flat set of states into a filter then we get a non-flat set of states 
out. For a proof of this see Appendix |A"1 

In quantum theory filters also have the property that they send pure states 
to pure states (up to normalization) . We call transformations that do this non- 
mixing transformations. Interestingly, in quantum theory, it turns out that all 
non-mixing transformations are also non- flattening (see Appendix [XJ. This is 
not surprising. In general probabilistic theories, it is quite hard to see how we 
could flatten sets of states in such a way that all pure states remain pure (up to 
normalization). We conjecture in the postlude that P5 can be replaced by the 
postulate that filters are non-mixing. 
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Part I 

Introduction 

In this work we will give a reformulation of quantum theory terms of math- 
ematical axioms (in Part IIII[) and a reconstruction of quantum theory from 
operational postulates (in Part IIVI) . We show that the mathematical axioms 
are equivalent to the usual formulation of quantum theory in terms of density 
matrices, positive operator valued measures, and completely positive maps). We 
consider only the case of finite dimensional Hilbert spaces. We then show that 
the operational postulates are equivalent to the mathematical axioms. 

1 Background 
1.1 Motivation 

In quantum theory (QT) states and measurement outcomes are represented by 
positive operators on a complex Hilbert space. The probability for any particular 
outcome is given by the trace rule (also known as Born's rule). Evolution is 
given by completely positive maps (examples being unitary operators and von- 
Neumann Projection). This structure is rather abstract. Why Hilbert space? 
Why complex Hilbert space? Why represent states and measurement outcomes 
in this way? Why do the usual postulates of quantum theory take the particular 
form they do? In physics we answer "why" questions like this by finding simpler 
more natural postulates, axioms, or laws. For example Kepler's three laws of 
planetary motion were empirically adequate for predicting planetary motion 
at the time. However they are rather ad hoc. One could argue that they are 
explained by Newton's three laws of motion plus his universal law of gravitation. 
The Lorentz transformations are rather abstract and not at all natural in and of 
themselves. However, they were accounted for in a natural way by Einstein's two 
postulates (that the laws of physics are the same in every inertial frame and that 
the speed of light is independent of the source). Quantum theory is ad hoc and 
abstract in the same way that Kepler's laws and the Lorentz transformations 
are. What is needed is some more natural postulates from which QT follows. 

To gain an idea of how such postulates might look we need to think about 
what kind of theory QT is. Quantum theory applies to a wide range of physical 
phenomena - spin degrees of freedom, interferometers, tunneling particles, etc. 
However, what all these applications have in common is that quantum theory 
is used to calculate probabilities. Quantum theory is a probability calculus. In 
this sense, its natural predecessor is not Newtonian physics, but rather what 
might be called classical probability theory (CProbT). CProbT is the calculus 
used to calculate probabilities for classical situations like tossing coins, throwing 
dice, predicting the weather, and so on. Like quantum theory, it comprises a set 
of rules which apply to a wide range of physical phenomena. In writing down 
postulates for classical probability theory we had better not make them specific 
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to one type of situation (e.g. dice) in which the theory might be applied. Like- 
wise, natural postulates for quantum theory should be natural for any situation 
in which quantum theory might be applied. Consequently we must expect a 
certain level of abstraction - the postulates cannot make necessary reference 
to particular physical quantities (for example, position and momentum) which 
are only defined for some types of situation where QT might be applied. Such 
considerations limit how "physical" the postulates should be. To deal with 
this issue we will outline a rather general operational framework (the circuit 
framework) pertaining to a wide range of physical phenomena. 

1.2 Previous work 

There is a long tradition of thinking about deriving quantum theory from more 
reasonable axioms or postulates going back to von Neumann |72] and Mackey 
[52] . Much of the early work was in the quantum logical tradition, such as 
the papers of Birkhoff and von- Neumann 8 ; , Zierler [76] , and Piron [62] . The 
convex probabilities framework (basically this is the idea of representing states 
as vectors of probabilities) goes back to originally to Mackey and has been 
worked on by many others since including Ludwig |51j . Davies and Lewis |19j . 
Gunson [35], Mielnik [57], Araki [3], Gudder et al. [34 , Foulis and Randall 
[25], and Fivel [23]. 

In the past decade a number of papers have been written on the topic of 
reconstructing quantum theory [3i[Il[Tl[7J[MllMllSIJ[ni[Mll31llHl[Ml 
|2~4"1 IT3] . In 2009 a conference on reconstructing quantum theory was held at 
Perimeter Institute (the talks can be viewed on PIRSA [21]). Many of these 
have been inspired by ideas coming from quantum information which, generally, 
consider finite dimensional Hilbert spaces. This program was very much inspired 
by Fuchs's suggestion that we need to find information-theoretic reasons for the 
quantum axioms (presented in a number of talks and written up in |26j). 

Much of this work is in the convex probabilities framework. Recent treat- 
ments and developments of this framework can be found in [36] , [6] , [5] , [M] [13] , 
[40l l42l |4T1 |44] . In parallel with this work, Abramsky and Coecke developed a 
categorical approach to quantum theory [JJ . One of the most salient features of 
the categorical approach is that it gives rise to a kind of pictorialism [IB] ■ These 
pictures are basically the circuits in the circuit model presented in Part [IT] Mo- 
tivated in part by this pictorial approach, Chiribella, D'Ariano, and Perinotti 
[14l [13] and the present author [40] [42] [41] [44] put forward various frameworks 
which show how probabilities can be put on top of such pictures in accord with 
the convex probabilities framework. 

In order to reconstruct quantum theory from operational postulates we need 
to specify what we mean by quantum theory. To this end, we will provide a 
reformulation of quantum theory using the duotensor framework [44] in terms 
of two mathematical axioms. Although this mathematical reformulation was 
developed to aid the operational reconstruction, it stands alone and may be 
more interesting to some readers than the operational reconstruction. The key 
point of this reformulation is that, rather than associating a completely positive 
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map with an operation, we associated a positive operator acting on the tensor 
product of the input and output Hilbert spaces. 

The two mathematical axioms presented here are part of a development 
of ideas by the author beginning with [36] in which a framework for general 
probabilistic theories was given. This framework requires fixed causal structure. 
In a theory of quantum gravity we do not expect fixed causal structure. To 
address this, the causaloid framework [38 was developed for theories in which we 
do not need to have fixed causal structure. We cannot assume an evolving state 
in such a situation. Hence, in the causaloid framework, mathematical objects 
apply to arbitrary regions of spacetime. We use the causaloid product to combine 
such objects for non-overlapping regions. Quantum theory was formulated in 
the causaloid framework. In the case of quantum theory we do have fixed causal 
structure so the full machinery of the causaloid approach may be more than is 
necessary. It is instructive, then, to apply the kind of thinking in the causaloid 
formalism where we consider arbitrary regions of spacetime to the situation in 
which we do have definite causal structure. This was done in the duotensor 
framework [13] . Motivated by the work of Abramsky and Coecke [T] , pictorial 
techniques are used. In the present work, we use the duotensor framework 
to associate operators with fragments of a circuit (fragments are the analogue 
of an arbitrary region of spacetime in the circuit framework). Such operators 
can be combined with the circuit trace to obtain the operator for a composite 
fragment. It turns out that this approach is very similar to the quantum combs 
framework of Chiribella, D'Ariano, and Perinotti (CDP) [12]. They associate 
Cho-Jamiolkowski operators with fragments and provide the link product for 
combining them. The basic equations of CDP are related to the basic equations 
in the reformulation given here by appropriate insertion of partial transposes. 
The formula for the circuit trace is a little simpler than the link product, but 
the idea is similar. The circuit trace is an example of the causaloid product, as, 
effectively, is the link product. Related ideas appear in the work of Aharonov, 
Popescu, Tollaksen, and Vaidman [5] who consider multiple-time states and 
Oeckl |5T] who has developed a general boundary formalism for quantum theory. 
The work of Aharonov et al and of Oeckl apply to the pure state case whereas 
the quantum combs framework and the framework to be presented here apply 
to the general mixed state case. There are many other related approaches 
in which particular attention is given to issues concerning causality. Sorkin 
has developed the causal set approach to quantum gravity [66]. Markopoulou 
developed the quantum causal histories approach |54] , and a dual point of view 
to this was provided by Blute, Ivanov, and Panangaden [5]. Leifer [5U] has also 
done interesting work concerning the evolution of quantum systems on a causal 
circuit. 

The project of reconstructing quantum theory from operational postulates 
presented here is a continuation of a project initiated by the author ten years 
ago [36) . There a small number of operational axioms were given from which 
quantum theory can be reconstructed (see also Sec. [14]) . One of the axioms 
given in [36j is not particularly compelling. This is the simplicity axiom which 
says, basically, that states are specified by the smallest number of probabilities 
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consistent with the other axioms. This forces us to take the second simplest 
case in the Wootters hierarchy [75J [74] of theories (see Sec. 110. 9|) . 

The problem of replacing the simplicity axiom with more a compelling ax- 
iom was left open for a long time. Then, in 2009, two papers appeared which 
addressed the problem. First, Chiribella, D'Ariano, and Perinotti [T3] showed 
that the success probability for probabilistic teleportation is bounded by the 
inverse of the number of probabilities required to specify a state (see Lemma 22 
in [2]). In a subsequent paper they used this in a full derivation of quantum 
theory from operational axioms with no need for a simplicity axiom jl3j . Sec- 
ond, Dakic and Brukner 17\ gave an argument to get rid of the simplicity axiom 
in a derivation of quantum theory based on the ideas in [55] . They argued for 
a bound on the number of probabilities required to specify the state coming 
from considering entangled states for two generalized bits. Their argument was 
sharpened by Masanes and Muller |55j in another reconstruction of quantum 
theory. 

In the this work we present a new set of postulates. These are mostly dif- 
ferent from the axioms in [3B]. We adopt the technique of Chiribella, D'Ariano, 
and Perinotti [T4] [13] to avoid the need for a simplicity axiom though the ap- 
proach of Dakic and Brukner [T7] with the improvements due to Masanes and 
Muller [55] may also be applicable here. 

To guide the readers intuition, sometimes remarks will be included in square 
parenthesis [like this] that discuss how things look in classical probability theory 
or quantum theory. 

All figures in this paper were drawn using version 1 . 1 of the duotenzor pack- 
age (see Appendix [F]). 



2 Main results 

2.1 The circuit framework 

In Part [II] we will show how to describe certain types of experiment in opera- 
tional terms within what we will call the circuit framework. In this model, an 
experiment consists of a bunch of apparatuses placed next to each other so that 
apertures on the apparatuses are aligned with one another. Each apparatus 
may have outcomes on it (as read off meters or detector clicks for example). 
Each apparatus use will be associated with an operation (represented by a box 
in the graphical representation of a circuit). An operation is associated with an 
outcome set, this being a subset of the possible outcomes on the apparatus. An 
operation has a bunch of inputs and outputs and can be represented as 



b c 

1/ 



m 

abb 



A a b :g b , (ii) 
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for example. The labels a, b, etc. correspond to different system types. The 
inputs enter the box at the bottom and are represented by subscripts in the 
symbolic notation. Outputs leave the box at the top and are represented by 
superscripts in the symbolic notation. The alignment of apertures is represented 
by a wire. A circuit consists of a bunch of operations wired together so that 
there are no inputs or outputs left over. For example, 



In the symbolic notation the repeated integer label indices correspond to the 
placement of the wires. Such circuits are to be understood graphically. It is 
common to think of the vertical axis as corresponding to a background New- 
tonian time in interpreting circuit diagrams. In this case it would matter how 
high up the page a box is placed. We do not think in this way here. There is 
absolutely no significance to the vertical position of the boxes on the page in 
the diagrams in this work. The boxes can be moved to any position. As long 
they maintain their orientation (so inputs remain as inputs and outputs remain 
as outputs) and the wires (which can be stretched) continue to be connected to 
the boxes in the same way, the diagram does not change its meaning. Thinking 
about experiments as circuits provides a deeper foundation for understanding, 
in operational terms, notions like system and state and will form the backdrop 
against which the postulates are set. We make three assumptions as part of the 
circuit model which are regarded as being too basic to be part of the postulate 
set. The first two are (Assump 1) that we can associate a probability with 
a circuit which depends only on the description of that circuit, and (Assump 
2) that non-trivial finite systems exist. The third assumption says, basically, 
that hypothetical states which are operationally indiscernible from some exist- 
ing state to any accuracy actually exist (in fact the assumption is a bit more 
general than this). This assumption is one of mathematical convenience and 
allows us to deduce that the sets of states are closed. It is not possible to op- 
erationally distinguish the case where we make this third assumption form the 
case where we do not as we cannot make arbitrarily accurate measurements. 
Although very reasonable, these basic assumptions may have to be modified in 
a theory of quantum gravity. 

2.2 The reformulation 

In Part IIIll we will provide a reformulation of quantum theory in which objects 
called duotensors mediate between circuits composed of operations (as in the 




(12) 
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circuit framework) and circuits composed of operators (these are mathematical 
objects acting on complex Hilbert spaces). 

A duotensor [33J is like a tensor but with a bit more structure. In Sec. [5] 
we will show how to associate duotensors with operations if operations have a 
certain property - that they are fully decomposable. It turns out that postulate 
P3 is equivalent to full decomposability of operations. By associating operations 
with duotensors we can convert a circuit into a duotensor calculation for the 
probability associated with that circuit. The ideas in this section were first 
presented in [33] , 

In Sec. [7] we will show how to associate operators with duotensors. We con- 
sider the space, V a , of Hermitian operators on a Hilbert space H a of dimension 
N a . We also consider the space, V a , of Hermitian operators on a Hilbert space 
T-l a also of dimension iV a . In general we are interested in operators in the space 

v';t:::li ■= v ai ® v b2 ® • • • ® v C3 ® v d4 ® v es ® ■ ■ • ® v fe (13) 

We represent an operator in this space as 



d e f 



A J ^ (14) 

a b c 

It turns out that operators have the property that they are fully decomposable. 
This means that we can associate a duotensor with every operator. We can wire 
these operators together to form circuits. The wires tell us how to match up 
the different parts of the tensor product space. 

We can place wires between operators (denoted by a repeated label in sym- 
bolic notation). A wire (or repeated label) indicates that we are taking the 
trace. For example, 

A 3l B ai (15) 

This is equal to the trace of the product of A 31 6 V ai and B 3l E V 3l . More 
generally we have expressions such as A 3lh2 B^ 34 C^ C33i where we have more 
than one wire. In this case the wire (or repeated label) indicates that we take 
the partial trace over the corresponding spaces. This means that, for example, 

A^B^C^ G V!^ (16) 

This is similar to Einstein's summation convention. Here the partial trace is 
implicit where ever we have a repeated index (or wire). 
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In Sec. |S] we will bring the duotensor treatment of operations and operators 
together. We will show that if a certain condition holds we can associate an 
operator, 

with every operation, " c 6 such the probability for a cir- 
cuit formed from operations is given by the trace of the corresponding operator 
expression. For example, 



Prob(A a ^B c b f C aiC3a J = 
This same example in diagrammatic form 



/ ja i b 2 E>C3a 4 A 
A D b 2 °aiC3a4 



(17) 



Prob 





(18) 



If we can calculate the probability for a circuit from the trace under such a 
mapping from operations to operators in this way, we will say that operations 
correspond to operators. Note that this is different from the usual formulation 
in which completely positive maps are associated with operations. 

We will see that operators, such as B^, can sensibly be associated with 
operations if, after taking the input transpose (this is the partial transpose over 
the input part of the space), we get a positive operator. A complete set of 
physical operators, {-BMZ] : Z = 1 to L}, has the property that every operator 
in the set has positive input transpose and, further, 



L 

E 

i=i 



(19) 



where I ai is the identity operator acting on H ai . 

We will show that quantum theory for finite dimensional Hilbert spaces can 
be formulated rather succinctly with the following two axioms. 

Axiom 1 Operations correspond to operators. 

Axiom 2 Every complete set of physical operators corresponds to 
a complete set of operations. 

The operators here are understood to act on a complex Hilbert space. A com- 
plete set of operations is a set of operations corresponding to the disjoint out- 
come sets of the same apparatus use. A complete set of operations is a set 
of operations corresponding to the same setup (same apparatus with the same 
settings) where the associated outcome sets are disjoint and have union equal 
to the full set of outcomes for this setup. 

Axiom 1 tells us that we can calculate probabilities for circuits by corre- 
sponding operator expressions as in (|17I18I) . Axiom 2 guarantees that proba- 
bilities are greater than zero (the requirement that the operators have positive 
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input transpose imposes this) and that the sum of the probabilities over all 
outcomes adds up to one as required (the condition in (|19l) imposes this). 

2.3 The reconstruction 

In Part IIVI we will show that classical probability theory and quantum theory 
are the only two theories consistent with the following postulates within the 
circuit framework. 

PI Sharpness. Associated with any given pure state is a unique maximal ef- 
fect giving probability equal to one. This maximal effect does not give 
probability equal to one for any other pure state. 

P2 Information locality. A maximal measurement on a composite system is 
effected if we perform maximal measurements on each of the components. 

P3 Tomographic locality. The state of a composite system can be determined 
from the statistics collected by making measurements on the components. 

P4' Permutability. There exists a reversible transformation on any system ef- 
fecting any given permutation of any given maximal set of distinguishable 
states for that system. 

P5 Sturdiness. Filters are non-flattening. 

We can single out quantum theory by adding anything that is inconsistent with 
classical probability theory yet consistent with quantum theory. One way to do 
this is to add the word "compound" to postulate P4': 

P4 Compound permutability. There exists a compound reversible transforma- 
tion on any system effecting any given permutation of any given maximal 
set of distinguishable states for that system. 

A compound transformation is one that can be made from two sequential trans- 
formations (neither equal to the identity). P4 fails, in particular, for a classical 
bit. We will discuss these postulates in Sec. GO In the three sections after that 
we will show in detail how to reconstruct quantum theory (as given by the above 
two mathematical axioms) from these postulates. Sec. [TOl and Sec. [11] can be 
read without reading the sections on duotensors (Part UTI] ). Sec. fT2l requires the 
techniques developed in Part IIIII 

In Sec. [10] we prove numerous results using only PI, P2, P3 and P4' (i.e. 
without using the assumption that filters are non- flattening of P5). 

We will show that PI implies causality - namely that the future cannot 
influence the past. This means we do not need this as a separate assumption. 
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We will show that there exists a reversible transformation between any pair 
of pure states. The proof of this uses the following construction 



where P and Q are transformations that permute a maximal sets of distinguish- 
able states for the composite system entering them. By choosing appropriate 
permutations the incoming state is mapped on to the b system then mapped 
back onto the a system. 

We define systems as being the thing we have after a filter. By showing 
that two filters acting in parallel act as a filter on the composite, we are able 
to show that the composite of two systems is a system itself. We show how to 
construct arbitrary filters using a transformation similar to (|2T)|) but where Q is 
the inverse of P. Since we have arbitrary filters we have arbitrary systems. 

Let N a be the maximum number of distinguishable states for systems of 
type a. We show that if N a = then systems of type a and b have the 
same properties. We also show that the K a = N a where K 3 is the number of 
probabilities required to specify the state and r is an integer greater than or 
equal to one. 

In Sec. [IT] we consider systems having N a = 2. We call such systems gen- 
eralized bits (or gebits for short). First we show that pure states for a gebit 
are represented by points on a hypersphere. In the next step we use P5 (that 
filters are non-flattening) for the first time. We use this to show that all points 
on the hypersphere represent pure states for a gebit. We do this by considering 
filtering on a getrit (a system having Nt, = 3) to put constraints on a gebit. 
Finally we show that the hypersphere must, in fact, be a 2-sphere. This means 
it corresponds to the standard Bloch sphere of quantum theory. This proof uses 
an ingenious technique developed by Chiribella, D'Ariano, and Perinotti [T4 ] [T3" ] 
based on teleportation. This implies that r = 2, or K a = N^. It is by means 
that the simplicity axiom of |36j is eliminated. 

In Sec. [12] we finally reconstruct quantum theory for systems of arbitrary iV a 
by establishing that there is a correspondence between operations and operators 




(20) 



b 
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and showing that we can construct a complete set of operations corresponding 
to every complete set of physical operators. Indeed, we show that the set of 
operations 



I T 



Q 



VI] 



(21) 







can be set to correspond to any complete set of positive operators. Note the set 
is generated by considering different outcomes I. 
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Part II 



The circuit framework 

3 Operational description in the circuit frame- 
work 

In this section and the following two sections we will describe the circuit frame- 
work to which the postulates will be applied. In this section we will deal with 
the operational description of circuits. 

3.1 Apparatuses 

Physically we perform experiments by placing apparatuses (such as lasers, beam- 
splitters, lenses, . . . ) next to each other in an appropriate way. 

Apparatuses: An apparatus, A, is a physical device having: (i) a means for 
determining what constitutes a single use of this apparatus (this could 
be given by gating the use of the apparatus with respect to an external 
clock); (ii) apertures which can be placed next to apertures on other ap- 
paratuses; (hi) settings (fixed, for example, by setting some knobs); and 
(iiii) outcomes, denoted by x_4, (read off a meter for example). 

Apertures: An aperture is a hole or some other such like that allows one use 
of an apparatus to be connected to another use of an apparatus (in such a 
way that we can imagine a system passing from one apparatus use to next 
apparatus use). In the case that the two apparatus uses are sequential 
uses of the same apparatus then we can think of the aperture as simply 
corresponding to the spacetime region which interfaces these two uses. 

In this paper we are considering a limited class of experiments, namely those 
corresponding to linking up apparatuses using apertures. This is sufficient for 
the purposes of describing the kinds of experiments which are done to test 
classical probability theory and quantum theory. However, we can imagine 
more general notions of apparatus that can be linked up in other ways [42] . 
It seems likely that we need a more general notion of apparatuses to give an 
operational account of general relativity for example. 

3.2 Operations 

We can extract another notion, that of the operation, which we obtain by adding 
some structure to the notion of an apparatus use. The point of this extra 
structure is, as we will see below, to prescribe the ways in which we use the 
apparatuses. 

Operations: An operation, A, corresponds to a single use of an apparatus 
where (a) we identify inputs with some apertures, (b) we identify outputs 
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with some apertures, (c) we fix the setting (select particular knob settings) 
and restrict our attention to a specified set of outcomes, (in the simplest 
case we restrict our attention to a set containing a single outcome). 

Inputs and outputs: An input corresponds to using an aperture to allow a 
system of a specified type to pass into the apparatus. An output corre- 
sponds to using an aperture to allow a system of a specified type to pass 
out of the apparatus. 

Types: When we specify an input or output we must also specify the type. 
The type corresponds to the kind of systems (electrons, photons, small 
rocks, . . . ) we use an aperture for. 

Settings: The setting is part of the specification of the operation. We denote 
it by s(A). If we have a different setting for the same apparatus then we 
have a different operation. 

Outcome set: Each operation has an outcome set, denoted by o(A), associated 
with it. If e o(A) then we say operation A "happened". The outcome 
set is part of the specification of A. 

Compatible operations: Operations are said to be compatible if they corre- 
spond to the same apparatus use and the same knob settings but have 
different outcome sets. We will denote such operations by A[i] and the 
corresponding outcome sets by Oj(A). 

A complete set of operations is a set of operations that are compatible, 
whose outcome sets are disjoint, and where the union of these outcome 
sets is equal to the set of all possible outcomes. 

We use the following notation to represent operations 

b c 

11 
A 



a b b 

On the left hand side we have diagrammatic notation, on the right hand side 
we have symbolic notation. In the diagrammatic notation, inputs enter at the 
bottom of the box and outputs leave at the top of the box. In the symbolic 
notation inputs appear as subscripts and outputs appear as superscripts. When 
we use symbolic notation it is necessary to label the inputs and outputs with 
integers so we can identify which outputs are connected to which inputs (see 
below) . In the diagrammatic notation these integers are not necessary since we 
can see, by looking at the diagram, where the wires go. Note that the knob 
setting and outcome set are taken to be absorbed into the specification of the 
operation, A, so we do not represent them explicitly in this notation. 



^b 4 c 5 
aib 2 b 3 



(22) 
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3.3 Wires 



Experiments are performed by placing apparatuses next to each other. In prac- 
tise we can place apparatuses next to each other in any way allowed by their 
physical geometry. For example, all the apparatuses can be piled into a box kept 
in a dark dusty corner of the laboratory. We do not, generally, expect useful 
physics to come out of such haphazard arrangements. For this reason we have 
introduced the notion of operations which, in conjunction with the following 
wiring rules, prescribe the use of apparatuses when building circuits. 

Wires: A wire corresponds to placing two apertures next to each other. For 
any collection of operations connected by wires we demand 

Directed: A wire connects an output to an input. 

One wire: At most one wire can be connected to any given input or 
output. 

Type matching: Wires can only connect outputs to inputs of the same 
type. The wire therefore has an associated type, denoted a, b, ... 

No closed loops: Wires are directed (they go from output to input). We 
demand that if we trace forward along wires through the operations 
then we cannot get back to the same operation. Since operations 
correspond to single uses of apparatuses, this corresponds to ruling 
out closed time-like loops. 

We use the following notation (diagrammatic on the left, symbolic on the right) 
to represent two operations being joined by a wire: 



d c 

W 
B 

7 

a b\ c 

I 
A 

7TV 

a a b 



A^B^ 8 (23) 



aia2b3 a6b4 



In the symbolic notation, the wire is represented by a repeated index. This 
is the reason we have to label the inputs and outputs with integers. These 
integers are just labels and have no meaning beyond the fact that they tell us 
which inputs and outputs are joined. We could permute the integers in any way 
without changing the physical meaning. 

The physical meaning of the wires is that they tell us which input and output 
apertures are placed immediately next to one another. This is a little like the 
diagram that often accompanies a self-assembly piece of furniture. This diagram 
shows an exploded view of the piece of furniture with lines drawn from one piece 
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to another showing how it is to be assembled. The main difference here that an 
experiment may have moving parts and so our diagrams constitute something 
that happens in spacetime, whereas the piece of furniture is a static object. 
Note, in particular, that the wires in the diagram do not correspond to actual 
wires. If we have actual wires in an experiment (such as optical fibers) then 
we must treat them as operations (they have an input, and output, and are 
connected up to other operations). Only in the idealized case where physical 
wires correspond to the identity transformation could we consider treating them 
as wires as considered here. 

Type matching ensures that we use the apparatuses in the way they are 
intended to be used. For example, it prohibits us from matching an output for 
small rocks with an input for photons. In practice, there is nothing to prevent 
us from mismatching types in this way. However, it would probably lead to 
the malfunctioning of the apparatuses. In such a circumstance, any operational 
laws of physics we have will not enable us to make predictions. This is a genuine 
limitation of the operational approach to physics. Physics should tell us what 
will happen in any circumstance. 

3.4 Fragments 

The object in (|23|) above is an example of a fragment. 

Fragments: A fragment is formed by wiring together a bunch of operations. 
We will denote fragments by uppercase sans serif A, B, etc. (just as for 
operations which are, in fact, special cases of fragments). Note that a 
fragment can consist of disjoint parts (not connected by wires). 

Example: For example, let the fragment E be given by 



b a 

XL 

B 

J 



Fragment E 



\c d 



A. 

B 

7T\ 

c a c 



pbgay /-agciodn pbnau 
c 4 a 5 ci a 7 b2d 3 b 8 c^agcio 



(24) 



b d b 



Notice that the operation B is used twice here (this corresponds to two 
separate uses of the same type of apparatus). 

Features: A fragment will, in general, have some open inputs and outputs left 
over. In particular, it may have outputs which could, in principle, be 



17 



wired into inputs on the same fragment without violating the no closed 
loops assumption. It can be a part of a much bigger fragment. A frag- 
ment is therefore the circuit language equivalent of an arbitrary region of 
space-time. We allow a fragment to consist of disjoint parts (the circuit 
equivalent of an arbitrary region of space-time consisting of disjoint parts). 
The outcome X£ is given by specifying the outcome at each operation mak- 
ing up the fragment. In specifying the fragment, E, we give an outcome 
set o(E) (this is the cartesian product of the outcome sets for each opera- 
tion), the settings s(E) (specifying this means specifying a setting at each 
operation), and the wiring w(E). We will usually denote the settings and 
wiring by sw(E) for brevity. The statement that a given fragment "has 
happened" means that the set up with all the corresponding apparatuses 
placed together in accordance with the given settings and wiring, sw(E), 
was implemented and the outcome, X£, was in the given outcome set, o(E). 

Setups: Each fragment naturally belongs to a class of fragments corresponding 
to the same apparatus uses with the same knob settings and the same 
wiring but having different outcome sets. We will say that these fragments 
correspond to "the same setup". We will denote the members of such a 
set of fragments as E[i], the outcome sets by o,*(E), and the settings and 
wirings by sw/i(E). 

Deterministic fragments: A deterministic fragment is one for which the set 
of outcomes is equal to the the set of all possible outcomes. The set of 
all outcomes is the cartesian product of the sets of all outcomes at each 
operation constituting the fragment. Since the outcome must be in the 
set of all possible outcomes, deterministic fragments always happen when 
the corresponding setup is put in place (we borrow this terminology from 



3.5 Circuits 

We now introduce an important notion. 

A circuit is formed when we wire together a bunch of operations and have no 
open inputs or outputs left over. Circuits are special cases of fragments. 
We will denote them by uppercase sans serif font, A, B, . . . . For example, 
let the circuit H be 



El)- 



Circuit H 



E 
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A circuit can consist of disjoint parts. 

The outcome set for a circuit is given by the cartesian product of the outcome 
sets for each of the operations making up the circuit. We say that the 
circuit has "happened" if the outcome seen at each apparatus use is in the 
outcome set of the corresponding operation. 

3.6 Proto-systems 

We will later give a definition of what constitutes a system using the notion of 
filters. The idea is that a system is what we have after a filter. One example of 
a filter is the "do nothing" filter (i.e. just the identity). This allows us to define 
a special case of system which we will call proto-systems. A proto-system is 
what we have after a "do nothing" filter. All proto-systems are systems. 

To define proto-systems first we think about how we can break up circuits 
into fragments. In fact we can do this in any arbitrary way. One particular way 
we can do it is with synchronous sets of wires. 

A synchronous set of wires: is a set of wires having the property 
that it is impossible to trace forward from any wire in the set to 
any other wire in the set. By tracing forward we mean tracing along 
wires from output to input through the operations. 

For example, the three wires picked out below 




(26) 



constitute a synchronous set. A complete set of synchronous wires, or a hyper- 
surface, is a synchronous set of wires which partitions the circuit into two. We 
can foliate a circuit with hypersurfaces. For example, a complete foliation (one 
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which includes every wire at least once) is given below. 




It is always possible to provide a complete foliation for a circuit [30]. 
We now define 

A proto-system is associated with a wire or synchronous set of 
wires. The system type is determined by the wire types. 

For example, a proto-system of type aabc is associated with a set of wires of 
type a, a, b, and c. The proto-system aabc can, for example, be regarded as 
a composite with components of types a, ab, and c. As a matter of notation, 
we will sometimes refer to a proto-system that may be composite with a single 
letter. For example, the system aabc could be denoted by the symbol d. 

3.7 Preparations, transformations, and results 

We can partition a circuit into parts using systems. The resulting fragments are 

Preparations. Any fragment having open output but no open inputs is a 
preparation. Here are some examples: 



aba 

Ml 



and 




(28) 



The outputs necessarily constitute a synchronous set of wires (when a 
preparation is wired up to another fragment). Hence, preparations can be 
thought of as preparing a proto-system in some given state. 

Transformations. A transformation is a fragment having open inputs and 
outputs that is used in transformation mode (we will explain what this 
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means below). Here are some examples: 



b b a 



C 



Ml 



a 



B 



and 



a 



(29) 



A 



a c 



The fragment on the right could have the output fed either directly into 
the input or indirectly (via some other operations). If this is the case then 
the fragment is not being used in transformation mode. For us to say that 
a fragment is being used in transformation mode it must be the case that 
the outputs of that fragment lie on a later hypersurface than the inputs. 
This means that we should not be able to trace forward from an output 
along wires to an input on the fragment. Any fragment having some open 
inputs and outputs can be put in transformation mode. 

Results. These are fragments having open inputs and no open outputs. Here 
are some examples 



The inputs necessarily constitute a synchronous set of wires (when wired 
up to another fragment). Results can be thought of as corresponding to 
measurement outcomes. 

We note that there are fragments which cannot be understood as being equal 
to one of these special types (namely those fragments not bounded by input 
and output synchronous sets of wires). Nevertheless, any circuit can be broken 
up into preparations, transformations, and results. This is clearly true because, 
at the most fine grained level, operations always correspond to a preparation, 
transformation, or result. Having identified a given preparation, transformation, 
or result, we can use it to build different circuits. 

Deterministic preparations, transformations, and results are ones for which 
the set of outcomes is equal to the set of all possible outcomes. 

A measurement is made up of compatible results. We define: 

A measurement, {B a JZ] : I = 1,2,...}, is a collection of results 
corresponding to the same setup having outcomes sets, Oi(B), which 
are disjoint and whose union is the set of all outcomes. We can 
simply say that the outcome of the measurement is I (corresponding 
to the label of the outcome set). 



a c c 



D 



and 




(30) 
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4 Probabilities in the circuit framework 



Up to now we have simply discussed the operational description of experiments. 
Physical theories go beyond mere description. They make predictions. In the 
context of the the circuit framework predictions concern probabilities. In this 
section we will introduce probabilities. We need this so that we can introduce 
the notion of state and other related concepts in the section. 

Modern physics makes extensive use of probability. However, there is much 
debate about what the correct interpretation of probability is. There are various 
options. It could be interpreted as a relative frequency, as a propensity, as an 
objective degree of belief, as a subjective degree of belief, or some hybrid of these 
[30] . There are serious problems with all these approaches. Fuchs, who adopts 
point of view that probabilities are subjective degrees of belief, believes that the 
quantum formalism may follow from a proper understanding of how to interpret 
probability [28] . This is a significantly deeper point of view than adopted in the 
present work. We will simply define probabilities to have certain mathematical 
properties (the usual mathematical properties) without attempting to provide 
interpretation of probability or justification for these properties. Ultimately, 
however, the goal of understanding the nature of probability in the context of 
physics is likely to play an important role in resolving foundational issues and 
pushing physics forward beyond quantum theory. 

4.1 Assigning probabilities 

Whichever interpretation of probability we adopt, at some point we will wish 
to assign a probability for something to happen. In our context this means 
assigning a probability for a fragment to happen. If we do alow ourselves to 
assign a probability with a fragment then we denote this by 

Prob(A) := Prob(x^ G o(A)|sw(A)) (31) 

Here sw(A) stands for the settings and wiring of the fragment A. Importantly, 

We do not assume we can always assign probabilities. 

In particular, we may have a fragment with open inputs and outputs. The 
probability of the fragment happening may depend, for example, on what we 
send into the open inputs. That is, the probability may depend on conditions 
that are not given. In these circumstances the actions of some adversary (who 
has control over what is sent into the inputs for example) may influence how 
likely the fragment is to happen and so we should not assign a probability. In 
general we will allow ourselves to assign a probability with a fragment when 
we know it is independent of the actions of any adversary who has control over 
parts of the world not associated with the given fragment (e.g. the choice of 
settings, wiring, and outcome sets for operations that are not part of the given 
fragment). 

We will consider fragments resulting from wiring together two fragments. 
Thus, if we have fragments A and B then we will represent a fragment resulting 
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from wiring them together as AB. We have suppressed the input and output 
labels here because we are dealing with the general case. There may be more 
than one way of wiring together two fragments to obtain a bigger fragment. One 
special case that is always possible is where we do not place any wires between 
them. For example, putting back in the subscripts and superscript, we could 
have fragments A^ b and B^ C6 . In this case, AB could represent A^ b B^"^, 
Ag 3 ib2 B^ C6 , or Ag 3 ib2 B^ C6 to list just three possibilities. We will consider joint 
probabilities for such fragments. We will denote the joint probability by 

Prob(AB) := Prob(x^ B e o(AB)|sw(AB)) (32) 

We will also consider assigning conditional probabilities. We denote conditional 
probabilities by 

Prob(A|B) := Prob(x^ € o(A)|x B € o(B), sw(AB)) (33) 

We have suppressed the subscripts and superscripts in the notation AB because 
we are dealing with the general case. 

The idea of assigning a probability is essentially a primitive here. We will 
allow ourselves to associate a probability (or, more generally, a conditional prob- 
ability) in certain yet to be specified circumstances (when the actions of an 
adversary would make no difference) . Exactly what it means to assign a prob- 
ability will depend on which interpretation of probability one adopts. 

The idea that one should not always allow oneself to assign a probability 
does not seem to be much discussed in the literature. However, it is a fact that 
the physical theories we have only allow us to calculate probabilities for very 
special situations (see examples in Sec. I4.5[) . These situations are, generally, 
defined by the causal structure assumed to be operating in the background. For 
example, we would not make a probabilistic prediction for the 20th time step 
given only information about the 5th time step (and no information about what 
happened in between). But we may be able to make a probabilistic prediction 
for the 20th time step given information about the 19th time step (though even 
here we have to be careful - see the second example in Sec. I4.5[) . The fact 
that physical theories only allow us to calculate probabilities for rather special 
situations is not normally made explicit. In this paper we wish to take a more 
general point of view and so we do need to be explicit about this. 

4.2 Properties of probabilities 

We wish to demand the following properties of probabilities. 

Non-negative. If we can assign a probability Prob(A) for some fragment A 
then 

< Prob(A) (34) 

Deterministic fragments. For any deterministic fragment, A[f2] (where fl is 
understood to be the set of all outcomes) , we can assign a probability and 
this probability is equal to one, i.e. 

Prob(A[fi]) = 1 (35) 
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Additivity. Let A[fi], A[v], and A[/xUf] be three fragments associated with the 
same setup having outcome sets equal to jjL, v, and /zUf respectively. Let \x 
and v be disjoint. If we can assign probabilities to any two of Prob(A[//]), 
Prob(A[^]), and Prob(A[/x U v]), then we can assign a probability to the 
third and, further, 

Prob(A[^ U v]) = Prob(A[^]) + Prob(A[i/]) (36) 

Joint probabilities Let A[0] be a deterministic fragment and B be another 
fragment. Let A[57]B be a fragment resulting from wiring together these 
two fragments. Then if we can assign a probability to either of Prob( A[fi] B) 
and Prob(B) then we can assign a probability to the other and, further, 

Prob(A[f7]B) = Prob(B) (37) 

Conditional probabilities. Let A, B, be two fragments and let AB be a frag- 
ment that results from wiring them together. If we can assign probabilities 
Prob(AB) and Prob(B) and if, further, Prob(B) ^ 0, then we can assign a 
conditional probability Prob(A|B) where 

(in some treatments this is regarded as a definition of conditional proba- 
bility). 

These are the standard properties required of probabilities with the impor- 
tant additions that we have take some care to deal with the fact that we cannot 
always assign probabilities and we have stated them for fragments (comprised 
out of operations and wires). Note that we only consider finite |Q| so we do 
not concern ourselves with measure theoretic concerns that might result from 
having an infinite number of outcomes. This is consistent with the operational 
approach taken in this paper - the data taken in any real experiment can only 
have finite |f2|. We note the following 

1. It follows from ([Ml). ([33]) and (f31)| that probabilities, must be less than or 
equal to one. 

2. We can write 

Prob(AB) = Prob(A|B)Prob(A[Q] B) (39) 

but only with a caveat. In the case where Prob(B) 7^ equation (|39[) 
follows from (|3"51) . The caveat concerns the case when Prob(B) = 0. It 
follows from the properties of probabilities (in particular, using (|3"T1) and 
([31))) ') that we must also have Prob(AB) = 0. In this case we cannot 
associate a conditional probability Prob(A|B). However, no matter what 
value we put in the place of Prob(A|B) (as long as it in the range to 1) 
equation (l39l) will be satisfied (with both sides being equal to zero). We 
will understand this equation in this way. Note, by way of example, that 
in the proof given in Sec. 14.41 below works well with this understanding of 
(ED). 
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4.3 Assumption 1 for the circuit framework 



To get ourselves going we need some situation in which we can assign probabili- 
ties. A circuit does not have open inputs or outputs so it is reasonable to assume 
that it is not subject to outside influences. We make the following assumption. 

Assump 1. We can assign a probability with any given circuit (the 
probability that the circuit "happens" ), and this probability depends 
only on the specification of the given circuit (the knob settings and 
outcome sets at the operations, and the wiring). 

Hence we can meaningfully speak of Prob(A) for any circuit, A. 

It follows from Assump 1 that we can assign a conditional probability 
Prob(A|B) when AB is a circuit. Let A[f2] be the deterministic fragment corre- 
sponding to same setup as A. Then, for the special case where AB is a circuit, 
<[571) and® give 

(as long as Prob(A[f2]B) is non-zero). This is because we can, by Assump 
1, assign probabilities for both the numerator and the denominator (as they 
correspond to circuits). 



4.4 Probabilities factor for composite circuits 

It follows from Assump 1 that if A and B are both circuits then, for the 
composite circuit AB (consisting of the disconnected parts A and B), we have 

Prob(AB) = Prob(A)Prob(B) (41) 

because 

Prob(AB) = Prob(x^ € o(A),x B € o(B)| sw(A),sw(B)) 

=Prob(x^ e o(A)|x B G o(B),sw(A),sw(B))Prob(x B G o(B)| sw(A), sw(B)) 
=Prob(x.4 e o(A)|sw(A))Prob(x B e o(B)|sw(B)) 

where we use in the second line and Assump 1 in the third line. We can 
write this proof more succinctly as 

Prob(AB) = Prob(A|B)Prob(B) = Prob(A)Prob(B) (42) 

Note that if Prob(B) = then we have to be careful using the conditional 
probability since then it is not clear we can assign it using (|37[) . However, we 
still have Prob(AB) = Prob(A)Prob(B) since it follows from ^ and ([37]) that 
Prob(B) = implies Prob(AB) = 0. 

Chiribella, D'Ariano, and Perinotti take the factorization of probabilities for 
disjoint circuits as a starting point in their circuit model [HI I13j . 
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4.5 Well conditioned probabilities 

The probability Prob(A|B) makes sense if AB is a circuit but what if it is a 
fragment? We cannot assume that we can assign probabilities with arbitrary 
fragments in general as they may have open inputs and outputs. However, there 
may be particular situations in which we can meaningfully speak of probabilities 
for fragments even when do they have open inputs and outputs. For this end, 
the following definition is useful. 

Well conditioned probabilities: If A and B are fragments, we 
will say that we have a well conditioned probability, Prob(A|B), if 

Prob(A|BC) = Prob(A|BD) (43) 

for all fragments C and D such that ABC and ABD are circuits. 

If we have a well conditioned probability, Prob(A|B), it is is fully determined 
by A and B as long as the fragment AB is part of a bigger circuit. In such 
circumstances we can meaningfully assign a probability with Prob(A|B). Note, 
however, the important caveat that this is true as long as A B is part of a bigger 
circuit. If we have a setup in which we do not close all open inputs and outputs 
(so the fragment is not part of a bigger circuit) then it is not clear whether we can 
expect any theory to be useful. If we had open inputs not connected to some 
type-matched outputs then, in principle, anything might enter the apparatus 
through them. There is no guarantee that such things would not damage the 
experimental equipment (in which case we could not expect our theory to make 
useful predictions at all). For example, we may leave open inputs that are meant 
for photons but may happen to have small rocks impinging on them. Given 
such considerations, it is good experimental practice to close all open inputs. 
In general, in operational theories, we can only expect reasonable predictions 
when all inputs are type-matched. What about if we have no open inputs but 
still have open outputs? In causal theories there are no influences from the 
future (see T I18I in Sec. 110. ip . In such theories we can ignore the future. The 
action of ignoring the future implements a deterministic result and, under this, 
the outputs are effectively closed. However, (i) the theory under consideration 
may not be causal and, (ii) we may want to allow the situation in which an 
adversary can condition on something that happens in the future (so then we 
cannot ignore what actually happens in the future - see the second example 
below). For these reasons it is good experimental practise to close all outputs 
as well as inputs. We will assume that any fragment we consider is part of an 
experiment corresponding to a bigger circuit. 

For the special case where AB is a circuit we saw in Sec. 14.31 that the condi- 
tional probability Prob(A|B) is given by the standard equation 

Prob(A|B) . 

since, by Assump 1, we can assign the probabilities given in the numerator 
and denominator. What happens when AB is a fragment? If we have a well 
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conditioned probability Prob(A|B) then, for any C such that ABC is a circuit, 
Prob(A|B) . P r ob(A]BC) . ggg^ (45) 

Hence, 

if we have well conditioned probabilities Prob(AB) and Prob(A[/]B) (and the 
latter probability is non-zero). In other words, we can use the standard equation 
for calculating conditional probabilities as long as we can assign probabilities 
as required. This is consistent with the properties of probabilities given in Sec. 
IP1 

For generic fragments AB we should not expect to have a well conditioned 
probability Prob(A|B). To illustrate this, consider an experiment from quantum 
physics. Imagine we we have a device A ai which prepares a spin half particle 
(which we take to be of type a) in the up state followed, in sequence, by two 
spin measurements B a J and C^, along some directions, and then followed by 
an operation D^* which may be a spin measurement or something else (these 
spin measurements are "non-demolition" measurements that allow the system 
to emerge out the other end - i.e. they are transformations in the terminology 
of this paper) . 

b 



D 



(47) 



The following three examples illustrate the notion of a well conditioned proba- 
bility. 

1. There is not a well conditioned probability Prob(C^|A ai ) since what hap- 
pens depends on which direction the spin is measured along at B, i.e. 

Prob(C^|A ai B^) ± Prob(C^|A ai B a =) (48) 
where B a ^ is a spin measurement along a different direction from B a J . 

2. Perhaps a little more surprisingly, there is not a well conditioned proba- 
bility Prob(B a j|A ai ) because 

Prob(B a J|A ai C a ^) ± Prob(B a J|A ai Q) (49) 

(where C a ^ is a spin measurement along a different direction to C a ^) because 
postselection effects the probability (as a simple calculation will show). 
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3. The pre- and post-selected probability Prob(B^ | A ai ) is well conditioned 
since 

Prob(B^|A ai C^) = Prob(B^|A a 'C^D^) (50) 

for any Dg* and . This is true because C is a complete spin measurement 
(corresponding to a non-degenerate observable) and so subsequent post- 
selection does not effect this probability. 

5 Systems, states, effects, and transformations 

Having incorporated the notion of probability into the circuit framework we are 
now in a position to define a number of important notions. In particular, we 
can associate states with preparations, transformation matrices with transfor- 
mations, and effects with results. In this section we will see how to do this and 
define a number of related concepts. 

5.1 Equivalence of fragments 

We can define a rather useful notion of equivalence of fragments. Any circuit can 
be broken up into fragments. If we have a particular fragment, A, then it can 
be completed into a circuit with another fragment, C (note we are suppressing 
type labels since we are discussing general fragments). In general, there will 
be many fragments that complete a given fragment into a circuit. We will say 
that fragments A and B are equivalent if either one can replace the other in any 
circuit and the probability of the circuit remains unchanged when we perform 
such a replacement. That is 

A = B iff (i) for every circuit AC there exists a circuit BC and vice 
versa and (ii) Prob(AC) = Prob(BC). 

Two fragments can only be equivalent if they have the same types of inputs and 
outputs with comparable causal structure (so we can plug one fragment in the 
place of the other). 

We will also define a restricted notion of equivalence for transformations. 
Any fragment having some open inputs and outputs can be put in transforma- 
tion mode (see Sec. 13 . T[) . It is useful to define a notion of equivalence for the 
case that we only consider transformation mode. For this we will use the symbol 
=. We will say A = B iff, for every circuit AC in which A is in transformation 
mode there exists a circuit BC where B is in transformation mode such that 
Prob(AC) = Prob(BC). We can extend this to include preparations and results. 
For them, restricted equivalence is the same as equivalence (since there is no 
issue of having to put them into transformation mode). In Part IIIII we will 
introduce the assumption of full decompos ability to set up the mathematical 
framework within which we give mathematical axioms for quantum theory. In 
Part IIVI we give operational postulates for quantum theory. The third of these 
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is is the assumption of tomographic locality. In fact the assumption of full dc- 
composability and tomographic locality will be shown to be equivalent. We will 
show that it follows from tomographic locality that A = B if and only if A = B. 

5.2 Maximal sets of distinguishable preparations 

Consider a set of preparations, A ai [n] where n — 1,2, ... . These preparations 
are said to form a distinguishable set if there exists a measurement { B a [n] : n = 
1,2,...} such that 

Prob(A a >]B ai [n']) =5 nn ,. (51) 

If there exists no distinguishable set of preparations having more elements, then 
this set is said to be maximal. We let the number of elements in a maximal 
set of distinguishable preparations is N 3 (for a proto-system of type a). The 
measurement that distinguishes them is called a maximal measurement. The 
elements of a maximal measurement are called maximal results. We will re- 
serve the letters U, V, and W to denote maximal distinguishable sets and the 
corresponding results. Thus, we have 

Prob(U ai [n]U ai K]) = «W (52) 

where {U a [n] : n — 1 to N} are is the maximal distinguishable set of prepara- 
tions, and {U[n] a : n = 1 to N} is the corresponding maximal measurement. 
Note there is no ambiguity using the same symbol for preparations and the 
corresponding results since the position of the label a tells us whether we have 
a preparation or an result. We have similar properties for V and W (in some 
proofs we will need to refer to more than one maximal distinguishable set). 

The maximum amount of classical information we can send with a proto- 
system is given by N 3 . Measured in bits, it is equal to \og 2 N a . We will call 
\og 2 N a the information carrying capacity. 

We can use the notion of a maximal measurement to define certain restricted 
classes of preparations which we will call informational subsets. 

Informational subsets. Let {U a [n] : n = 1,2,...} be a maxi- 
mal measurement. An informational subset, S, is associated with a 
subset of the outcomes O(S) C (1, 2, . . . , N 3 ). We define the infor- 
mational subset S as consisting of all preparations, A" 1 that, when 
this maximal measurement is performed, only give rise to outcomes 
in the associated subset, O(S) . 

This means, in particular, that 

if A ai e S then Prob(A ai U a [n]) = for all neO(S), (53) 

where O(S) is the set of outcomes not in O(S). For each informational subset, S, 
we can define the complement subset S which is associated with the complement 
set of outcomes O(S). 

We will say that system types having N 3 = 1 are trivial. 
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5.3 Systems and filters 

We defined proto-systems as being identified with synchronous sets of wires. We 
will now define what we mean by a system. To do this we need to define a filter. 
A filter is defined with repect to a maximal set of distinguishable preparations 
for a proto-system. 

A filter is associated with an informational subset of preparations, 
S. It is defined to be a transformation, F a J , that inputs and outputs 
a proto-system, a (which could be composite), which leaves prepa- 
rations in the informational subset S uneffected and blocks prepara- 
tions in the complement informational subset S. In symbolic form, 

if A ai G S then Prob(A ai F^B a2 ) = Prob(A ai B ai ) for all B a , (54) 

and 

if A ai eS then Prob(A ai F^B a2 ) = for all B a . (55) 
The capacity of the filter is defined to be equal to |0(5)|. 

[An example of a filter from quantum theory would be a transformation that 
projects onto a particular subspace of the Hilbert space. All states belonging 
to this subspace pass through unchanged. States orthogonal to the subspace 
states are blocked.] 
We now define 

A system is associated with a wire, or synchronous set of wires, 
after a filter. The system type is determined by the filter (which 
in turn is determined by the associated informational subspace and 
maximal measurement) and the wire types the filter acts on. 

We will denote this system type by letter different from that of the (unfiltered) 
proto-system, for example b, to denote that it has been filtered. This definition 
of a system is consistent with actual practice. Typically, an experimentalist 
will make sure he gains some control over what passes through the apertures 
by filtering. We can absorb the filters into the definition of the operations (on 
the outputs, and, if we want, the inputs as well since this makes no difference). 
Hence, except when we are proving results which depend on using filters, we 
need not explicit about whether we have filters in place or not. It is worth 
noting a few points. 

1. To indicate that we have a new system, b, after a filter acting on a proto- 
system a, we can write the filter as F a ^ though often we will write F a J. 

2. A filter, F a ^, followed by another similar filter, F a ^, is clearly also a filter 
which passes and blocks the same sets of preparations. 

3. The "do nothing" transformation (the identity filter) is clearly a filter. 
Therefore proto-systems are systems. 
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4. A composite of two proto-systems is a system itself since the identity 
filter on each component clearly acts as a filter (the identity filter) on the 
composite. However, it is not clear that a composite of two systems will, 
in general, also be a system. In Part IIVI we prove that it follows from the 
operational postulates given there that filters acting on the components 
of a proto-system always comprise a filter on the composite proto-system. 
Hence, if the postulates are true, a composite of two systems is a system 
itself. 

5. We will show that it follows from the postulates that we can build fil- 
ters corresponding to filtering to any informational subsets, S (having any 
number of elements in O(S)). We will also show that, for these filters, 
1 0(5) | is the maximum number of preparations that can be perfectly dis- 
tinguished after a filter. This means we can construct a system, b, having 
any given maximun number, jV^, of distinguishable preparations. 

6. We will say that one filter is smaller than another if it has smaller |0(S)|. 

Filters are defined to be transformations that act on proto-system. Having 
defined systems as what we have after a filter (on a proto-system) , we can define 
a notion of a filter on a system where the system, b, in question may itself have 
been previously obtained by filtering a proto-system, a. A filter on a system, 
b, is a filter on the underlying proto-system, a, defined with respect to a set of 
outcomes O(S') C 0{S) of some maximal measurement that can be used define 
a filter from a to b (0(S) being the set of outcomes for this original filter). 

5.4 States and effects 

We define 

The state associated with a preparation for a given type of system 
is that thing given by any mathematical object which can be used to 
calculate the probability for any circuit built from this preparation 
followed by any result for this type of system. 

Note, since we are calculating the probability for the circuit, we are using the 
state to calculate the joint probability of the outcome belonging both to the 
outcome set for for the given preparation and for the given result (rather than 
the conditional probability of seeing an outcome in the result outcome set given 
an outcome in the preparation outcome set). Also note that the system may be 
filtered. 

This definition ensures that the state is in one to one correspondence with 
equivalence classes of preparations. I.e. all preparations having the same state 
are equivalent to one another. 

One mathematical object that would serve as a state for some prepara- 
tion, A ai , for a system of type a would simply be a list of all the probabilities, 
Prob(A ai Z ai ), for all circuits that can be build from A ai followed by a result Z ai 
where Z ai runs over all possible results on a system of type a. In general this 



31 



will be a very long list of probabilities. However, in general, in a physical theory 
we expect there to be some relationships between the probabilities such that we 
can calculate all the probabilities from just a subset corresponding to a subset 
of the results (which we will call fiducial results). We will, further, consider only 
linear relationships (we will discuss this restriction below and in Appendix [B]) . 
Thus, we can consider a list of probabilities 

A ai =Prob(A ai X^) for o a = 1 to K 3 (56) 

where X^ 1 are the fiducial results and, for a general result, B ai , the probability 
is given by the linear relationship 

Prob(A ai B ai ) = A a >B ai (57) 

where summation over the index a\ is implied and B ai is a list of coefficients 
associated with the result B ai . 

Note that use sans serif font to represent fragments such as preparations 
(e.g. A ai ) and results (e.g. B ai ). We also use sans serif for the type labels, a, 
b, .... However, we use normal maths font to represent states (e.g. A ai ) and 
the coefficients associated with results. We also use normal maths font for the 
index a\ over which we sum. 

The list of coefficients, B ai , associated with the result B ai will play an im- 
portant role. We call this list of coefficients the effect associated with the given 
result. 

It is worth noting a few points at this stage. 

1. The choice of fiducial results, X^ 1 , will not in general be unique. However, 
we always choose minimal such sets (so that there exists no other set of 
fiducial effects with fewer elements). 

2. A ai is a list of K 3 probabilities giving the state associated with the prepa- 
ration A ai . Two preparations having the same state are equivalent. 

3. B ai is a list of K 3 coefficients giving the effect associated with the result 
B ai . These coefficients can, in principle, be negative [and in quantum 
theory they will sometimes be negative]. Two results having the same 
effect are equivalent. 

4. The integer K 3 plays an important role. It is is the dimension of the state 
(and effect) space for a system of type a. 

5. If K 3 is finite then the state can be determined from finitely many prob- 
abilities. We call such system types "finite" 

6. It is clear that K 3 > N 3 since we need N 3 fiducial results just to deal with 
the distinguishable states (see ([52|)V [In quantum theory K 3 = iV a 2 .] 

In setting up these ideas we considered only linear relationships. It is possible 
that nonlinear relationships will lead to a smaller number of fiducial effects. 
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However, in probability theories, it is very natural to allow arbitrary mixtures 
of preparations (where we toss a A weighted coin and prepare one preparation if 
we get heads, and another preparation if we get tails). It is shown in Appendix 
IBl that, if we allow such mixtures, then we cannot do better than use linear 
relationships and, indeed, we will necessarily have linear relationships for calcu- 
lating probabilities in any optimal representation of the state [30] . Even if we do 
not allow arbitrary mixtures, we can still choose to use a linear representation 
as here. The only caveat is that this may not be optimal [indeed, if we restrict 
ourselves to pure states in quantum theory a more optimal representation exists 
in terms of amplitudes rather than probabilities] . 

5.5 Assumption 2 for the circuit framework 

We make the following assumption 

Assump 2 There exists at least one finite and nontrivial system 
type. 

I.e. there exists at least one system type having K a finite and N 3 > 1. This 
assumption concerns two issues: (1) existence of non-trivial systems and (2) 
finiteness. 

We will see in Part IIVI that it is sufficient to assume the existence of one 
finite nontrivial type to prove from the postulates that there exist types for 
every finite N a and that these will be finite. 

This is a much weaker assumption than we might have required. We might 
have had to assume that there exists systems having N = 1, 2, 3, . . . and assume 
that, K a is finite for finite N a . 

It is impossible that we could prove that K a = oo experimentally. Thus, it 
might be said that a finite value follows from the operational approach taken 
in this paper. However, the issue is a little more subtle than that. If K a is 
finite then, once we surpass a certain level of experimental sophistication, the 
measured value of K a will stabilize at that finite value. However, if K a is infinite, 
then as we increase the sophistication of the experiment, the measured value 
of K a will continue to increase. We could assume that, for some fundamental 
reasons, there is a limit to how sophisticated the experiment can be. In this 
case the maximum measured value of K a would have to be finite. 

In [36j finiteness of the state space dimension follows from the simplicity 
axiom. It is a background assumption in [13] and [E] . Masanes and Miiller [55] 
actually include an explicit axiom in their list of axioms, namely "In systems 
that carry one bit of information, each state is characterized by a finite set of 
outcome probabilities." 

5.6 Types of state 

There are many ways of classifying states. We will collect a few useful classifi- 
cations together here. 
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The null state, Ql , has all entries equal to zero. One way (though not 
necessarily the only way) to prepare such a state is to add an outcome to our 
specification of a preparation that cannot happen and let the outcome set as- 
sociated with the null preparation have this outcome as its only member. Then 
the joint probability for a circuit consisting of this preparation followed by any 
effect must be zero (since the outcome on the preparation can never happen). 

Mixed states are states which can be simulated by a probabilistic mixture of 
some distinct states. Thus, a mixed state, A ai , can be written 

A at = XB a x + ^ _ y cai ( 58 ) 

where B ai and C ai are distinct and < A < 1. We include the null state, O" 1 , 
as a possible state in defining mixed states. 
Pure states are states which are not mixed. 

Two states, A ai and B ai are said to be parallel if A ai — /iB ai for some 
positive number fx. It is always possible to write the shorter of the parallel 
states as a convex sum of the other and the null state. Thus, if fx < we can 
write A ai = fj,B ai + (1 - fi)O ai . 

Any state that is parallel to a pure state will be called a pure-parallel state. 
Pure-parallel states are, strictly speaking, mixed. However, they can only be 
written as a convex combination of the null state and the pure state to which 
the state is parallel. If pure-parallel states are normalized then they become 
pure. 

5.7 Types of effect 

There are many types of effect we may consider. We will make particular use 
of the following types. 

A maximal effect is one associated with a maximal result (corresponding to 
one outcome of a maximal measurement). 

A deterministic effect is one associated with a deterministic result (having 
outcome set equal to the set of all outcomes). We will see that, in causal theories, 
the deterministic effect is unique for any given system type (see T ll8l in Sec. 110. ll 
and [H]). 

5.8 Sets of states 

Associated with any preparation is a state. Hence, associated with any set of 
preparations is a set of states. In Sec. I5.2l we defined two types of sets of prepa- 
rations, namely maximal distinguishable sets of preparations and informational 
subsets of preparations. 

Associated with a maximal distinguishable set of preparations is a maximal 
distinguishable set of states. By convention, we always exclude the null state 
from such sets (if we did not do this then the maximum number of distinguish- 
able states would be greater by one). This is reasonable since it gives rise to 
zero probability for any circuit it is included in. 
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Associated with an informational subset of preparations is an informational 
subset of states. These are the states that only give rise to some given subset of 
outcomes of some given maximal measurement (and have probability zero for 
the complement set of outcomes). 

We now define an important notion. . 

Non-flat set of states. A set of states is non-flat if it is a spanning 
subset of some informational subset of states. 

By a spanning we simply mean that any state in the informational subset set 
can be written as a linear combination (possibly with negative coefficients) of 
the states in the non-flat set. A set of states is said to be flat if it is not a 
spanning subset of any informational subset of states. The name flat is then 
justified since the set is missing at least one dimension. 

If we are given a set of states and have to test to see whether it is non-flat 
then we need to search for some informational subset of states for which it is a 
spanning subset. If there exists no such subset then the set of states is flat. 

Once we have the five postulates (PI to P5) in place we will see that the 
notion of a non-flat sets of states turn out to be a kind of generalization of 
the notion of a pure-parallel state. In particular, it follows from PI that the 
state in a single member non-flat set must be pure-parallel (see T I20[) . Further, 
non-flat sets share many properties with pure-parallel states: (i) A reversible 
transformation preserves both the pure-parallel property and the non-flatness 
property; (ii) The state of a composite system formed from components prepared 
in pure-parallel states is also pure-parallel (this is implied by T I251 and set of 
states for a composite system formed from the product states taken from non- 
flat sets for the components is non-flat (this follows from the postulates, P3 
in particular); (iii) It follows from the postulates (P5 in particular) that filters 
send pure-parallel states to pure-parallel states (see T I46j) and, more generally, 
they send non-flat sets to non-flat sets. 

5.9 Transformations 

In general, a transformation, Bjjj 2 , inputs a system of some type a and outputs a 
system of some type b. If it acts on a preparation, A ai , then A ai B^ constitutes 
a new preparation. The fiducial probabilities for this new preparation are 

Prob(X fc b 2 B a b2 A ai ) :=B b a \A a i (59) 

where we must have a linear expression on the left hand side since we can regard 
X? 2 B^ 2 as a result. Hence, the transformation matrix B b n 2 is associated with the 

D2 ai ' ai 

transformation B^ 2 . 

=1 

We will say a transformation acts on a system if it inputs and outputs the 
same type of system. In particular, if the system has been filtered, this means 
that it will remain in the appropriate informational subset. 

An identity transformation is a transformation on a system that leaves things 
unchanged. I.e. if it is inserted on any system in any circuit the probability for 
that circuit remains the same. 
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A transformation, B^ 2 , is said to be reversible if there exists another trans- 
formation, , such that the transformation B^ 2 Bj^ is equivalent to the identity 
transformation. Note that B^ is also, clearly, a reversible transformation. 

If a reversible transformation, B^ , acts on a system in a pure state then the 
state afterwards must also be pure. To see this assume the contrary. Assume 
A ai is pure. 

A a x _^ B£A ai = AC" 2 + (1 - \)D a ' 2 (60) 
where C ai and D ai are distinct and < A < 1. Therefore 

A ai = \B a a \C a2 + (1 - \)B a a \D a2 (61) 

The states B^l C a2 and B^l D a2 must be distinct since B"l is reversible. Hence 
A ai is mixed which contradicts our starting point. 

A non-mixing transformation is one that transforms pure states into pure- 
parallel states (i.e. states that are pure up to normalization). Strictly speaking 
pure-parallel states are mixed (they are a mixture of the given pure state and the 
null state) . Consequently, calling such transformations non- mixing is potentially 
misleading. However, this name conveys the idea better than the alternatives. 
[In quantum theory, filtering (i.e. projecting into a given subspace) is a non- 
mixing transformation since the projected state is pure (up to normalization) if 
the original state is.] 

A transformation is non-flattening if, for any non-flat set of state we send 
in, the set of states coming out is also non- flat. Note that the dimension of the 
space spanned by the output set may be different from the space spanned by the 
input set as the associated filter may be different. In the prelude we illustrated 
how filters themselves effect non-flattening transformations in quantum theory. 
In Appendix [A] we provide a proof that filters (and, indeed, all non-mixing 
transformations) are non-flattening in quantum theory. 

A compound transformation on a system, Bi) 2 , is a transformation on a sys- 
tem which can be formed from two sequential transformations on a system 
neither of which is equal to the identity. I.e. B^ 2 is compound if we can write 
B" 2 = Q 3 D= 2 . 

ai ai a 3 

5.10 Assumption 3 for the circuit framework 

Fragments have a certain input-output structure. This captures both the types 
of the inputs and outputs left open and also the causal structure between them. 
Fragments with the same input-output structure can be plugged in place of 
each other in a bigger circuit. Consider the set, S, of fragments having some 
given input-output structure. A general fragment, A 6 S, is characterized by 
its probabilistic properties 

{Prob(AC) : all C such that AC is a circuit} (62) 

(we are suppressing the type labels since we are dealing with the general case) . 
We can consider a hypothetical fragment, Q, with the same input-output struc- 
ture as the fragments in S. By "hypothetical" we mean that we are considering 



36 



the possibility that a fragment with a given input-output structure having cer- 
tain probabilistic properties may exist. 

We say that A is operationally indiscernible from B to an accuracy of 6 if 

|Prob(AC) - Prob(BC)| < 5 (63) 

for all fragments C such that AC and BC are circuits. 

We make the following assumption in the circuit framework. 

Assump 3 If, for any accuracy 5 > 0, there exists a fragment A[S] 
that is operationally indiscernible from a given hypothetical frag- 
ment, Q, then there actually exists a fragment with the probabilistic 
properties of Q. 

If Q is operationally indiscernible from some fragment in S for any accuracy 
S > then, whatever level of accuracy we work to, there always exists some 
fragment that behaves in the same way as the given hypothetical fragment. 
Since measurements cannot be arbitrarily accurate, there is no way of telling 
apart, by operational means, the case where we make Assump 3 from the 
case where we do not. In this sense this assumption is one of mathematical 
convenience. Chiribella, D'Ariano, and Pcrinotti have a equivalent background 
assumption in [13 (and they give a similar motivation). 

In Appendix [C] we consider the vectors formed from fiducial probabilities 
that characterize fragments having the same input-output structure. We show 
that, for the case where only a finite number of fiducial probabilities is required, 
it follows from Assump 3 that the space of such vectors is compact (i.e. they 
are bounded and closed). This means, in particular, that the space of states, 
effects, and transformations are compact (so long as they are characterized by 
a finite number of fiducial probabilities). More generally, it also implies that 
the sets of allowed duotensors considered in Part Mil are compact for this finite 
case. 
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Part III 

Mathematical reformulation of 
quantum theory 

In this Part we will provide a reformulation of quantum theory mediated by the 
duotensor framework put forward in |44j . In Sec. [6] we will review the framework 
showing how duotensors (tensor-like objects but with a bit more structure) can 
be associated with operations. In Sec. [7] we will show how a rather analogous 
framework can be set up for associating operators with duotensors. In Sec. [8]we 
show how to use duotensors to form a bridge between operations and operators 
and we use this to reformulate quantum theory in terms of two mathematical 
axioms. The reformulation we obtain is very similar to that provided by Chiri- 
bella, D'Ariano, and Perinotti (CDP) in [12]. The latter does not use duotensors 
as intermediate objects nor does it use the more succinct notation we adopt. 
However, the basic formulae of the two approaches are related by appropriate 
insertion of partial transposes (as discussed in Sec. 18.51) . The reformulation pro- 
vided in this paper and that of CDP have much in common with the quantum 
realization of the causaloid formulation given in [38j [39] . All three consider 
associating mathematical objects with general circuit fragments. 

This paper is written such that it is possible to follow the reconstruction of 
quantum theory from postulates in Part llVl up to and including Sec llll where we 
obtain the qubit without reading Part lllll The last part of the reconstruction, in 
which we obtain quantum theory for arbitrary Hilbert space dimension, employs 
ideas from Part IIIII 

6 Operations and duotensors 

The duotensor framework was put forward in [44 . It pertains to circuits and 
fragments built out of operations and wires. It is built on two assumptions. 
The first is the same as Assump 1 - that we can associate a probability with 
a circuit and this probability depends only on the specification of that circuit. 
The second is that operations are fully decomposable (we will explain this below). 
We will show that this second assumption is equivalent to the assumption of 
tomographic locality, P3. It has been developed for the case of finite dimensional 
state spaces (i.e. finite K s ). The duotensor framework is, therefore, applicable 
to the reconstruction from operational postulates to be given in Part IIV1 The 
notation employed so far in this paper is, for the most part, the same as in |44j . 

6.1 Motivation 

The motivation for constructing the duotensor framework is to provide a for- 
mulation for physical theories having the following property 
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Formalism locality: A formalism for a physical theory is said to 
have the property of "formalism locality" if we can do calculations 
pertaining to any region of spacetime employing only mathematical 
objects associated with that region. 

Note that this is a property of the way a theory is formulated rather than 
being an intrinsic property of the physics itself. In the circuit framework an 
arbitrary region of spacetime refers to an arbitrary fragment. In the duotensor 
framework we are able to associate a mathematical object, the duotensor, with 
any arbitrary fragment. Further, by having a duotensor for different fragments 
pertaining to the same setup (i.e. the same apparatus uses but with different 
outcome sets - see Sec. I3.4[) we can deduce whether there are any probabilities 
for this setup which happen to be independent of what is happening elsewhere 
and what these probabilities are equal to. Hence, we have a formulation having 
the property of formalism locality. 

In standard formulations of quantum theory we evolve a state in time with 
respect to some foliation of the spacetime. Circuits can be foliated using com- 
plete synchronous sets of wires (a set of wires is complete if it partitions the 
circuit) |44) . If we foliate a circuit in this way we are effectively partitioning it 
so that a preparation is followed by a bunch of sequential transformations fol- 
lowed, finally, by an effect. However, preparations, transformations, and effects 
are special cases of fragments. If our formalism requires that we use a foliation 
in this way then we do not have the formalism locality property. 

Histories formulations have a similar problem - they generally pertain to the 
entire history of the system and do not allow us to consider different arbitrary 
space time regions by themselves. 

In these formulations we effectively have equations that apply only to spe- 
cially shaped spacetime regions for which the probabilities are necessarily well 
conditioned. For example, in theories which evolve a state, these specially 
shaped regions must have an initial and a final space-like hypersurface for any 
interval over which the state is evolved. To make statements about arbitrary 
spacetime regions in such theories, we need to first apply the theory to the spe- 
cially shaped regions. This is the reason that these formulations do not have 
the formalism locality property. 

It would be impossible to identify special shaped regions if we had indefinite 
causal structure as we expect in a theory of quantum gravity. Given this, it is 
natural that the theory to be formulated in a way that it has formalism locality 
property. In [33 ESJ [53j |43l 06], the causaloid framework is developed. It 
provides a formalism local framework pertaining to situations where we do not 
have definite causal structure. Dealing with indefinite causal structure is not an 
immediate issue for the present paper since we are considering circuits whose 
wires define a causal structure. However, quantum theory with well defined 
causal structure may ultimately be best understood as a limiting case of a more 
general theory of quantum gravity having indefinite causal structure. In this 
case, it is likely that the formulation of quantum theory we would most naturally 
arrive at in such a limiting procedure would be one having the formalism locality 
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property. Consequently, a formalism local formulation of quantum theory, as 
provided in this work, is more likely to provide insight into the problem of 
quantum gravity. 

6.2 Equivalence 

In Sec. 15.11 we defined two fragments to be equivalent if the probability for any 
circuit containing one of these fragments is unchanged when the other fragment 
is substituted in its place. The duotensor framework uses a more general notion 
of equivalence that applies to linear sums of fragments. This more general notion 
reduces to the notion of Sec. I5.1l in the appropriate special cases. 
First we define the function p(-) as follows 

p(aA +j3B + ...):= oProb(A) + /3Prob(B) + . . . 

for circuits A, B, .... and real numbers a, (3, ... (these can be negative). Note 
that the p(-) function is only defined for linear sums of circuits. We cannot define 
something like this for general linear sums of fragments because fragments do 
not, in genaral, have a probability associated with them. 
We will consider expressions like 

expression = a + (3C + 7D + . . . 

where a, /?, ...are real numbers and C, D, ...are fragments (they may be 
circuits). Equivalence is defined in the following way 

Equivalence: We write 

expression = expression 2 

if 

p(expression 1 E) = p(expression 2 E) 

for any fragment E that makes the contents of the argument on both 
sides of this equation into a linear sum of circuits (note that we are 
suppressing the subscripts and superscripts on the symbol E since 
we are talking about a general fragment). 

One example illustrating this is the following. We have 

oA ai + /?B ai = 7 C ai + (5D ai 

if 

p([aA ai + /?B ai ]E ai ) = p([ 7 C ai + 5D ai ]E ai ) for all E ai 
Here is another (important) example. In general, we have 

A = Prob(A) for any circuit A (64) 
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The proof of this is simple. The most general fragment, E, that completes a 
circuit into a circuit is another circuit. For any circuit E 

p(AE) = p(A)p(E) = p(Prob(A)E) 

This example emphasizes that equivalence is a weaker notion than equality. 
Clearly a circuit is not, itself, equal to a number. 

In general, there are only two types of equivalence: 

1. Each expression is a real number plus a linear combination of circuits: 

a + @A + 7B H = 5 + eC + (D + ... 

where A, B, . . . , C, D, . . . , are all circuits. 

2. Each expression is a linear combination of fragments 

oA + PB H = 7C + SD + . . . 

where A, B, . . . , C, D, . . . , are all fragments having the same causal struc- 
ture (so that any one can be substituted for any other of these fragments 
in any circuit). 

6.3 Fiducial results 

We introduced the fiducial set of results {X" 1 : a\ — 1 to K a } in Sec. 15.41 We 
can write 

B ai = B Bl X£ (65) 

Since we are employing summation convention with respect to a\ we have here 
a linear sum of fiducial results weighted by the coefficients in B ai . To see that 
(|65|) is correct note that 

p(A ai B ai ) = p{A'^B ai ) = p(A ai X£)B ai = A^B a , (66) 



which is the equation we obtained in Sec. 15.41 

It will be useful to associate fiducial results with graphical elements 



X£ where 01 = 1 to K 3 (67) 



When represented graphically, fiducial elements have a black dot. The reason 
for this will become clear. We represent B ai with a graphical element: 

B ai <^=> ao-[B] (68) 

This has a white dot on it. Equation (|65[) can be written in both symbolic and 
diagrammatic form 



(69) 
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When we link up two elements we require that black dots are placed next to 
white dots. The horizontal link in (|69[) . interupted by a black and a white dot, 
corresponds to summing over the index a\. We define 

^M*] := (70) 
a a 

This is unambiguous at this stage since the fiducial result must have a black 
dot so the B box must have a white dot. This is a hybrid diagram. Hybrid 
diagrams have wires running up for operational description and links running 
to the left for the mathematics. Horizontal links between boxes represent the 
summation over the corresponding index. 



6.4 Fiducial preparations 

We can introduce a fiducial set of preparations, {aiX ai : ai = 1 to K 3 }. These 
correspond to a linearly independent spanning set of states. We define ai A by 

A ai = a A ai X ai (71) 

This can be related to A ai . Using (1551) and (fTTj) we obtain 

p(A ai B ai ) = <A p( ai X ai X£ ) B ai = a[ g a > a 'AB ai (72) 

where we define 

a[ g ai :=p(a'X ai X a 7) (73) 

We call this object the hopping metric. Since both the fiducial effects and 
the fiducial preparations correspond to linearly independent sets of vectors, the 
hopping metric must be non-singular. Comparing (|66p and (|72p we must have 

A *x = ^ai a' lA (74) 

Since the hopping metric is non-singular this shows that ai A is an alternative 
way of representing the state. The effect of the hopping metric is to cause the 
indices to hop across the central symbol. With this in mind, we define 

ai B := ai g< B a[ (75) 

This is an alternative way of representing the effect. We can write 

Prob(A ai B a2 ) = A a 'B ai = a >A ai B = ^ a AB ai = <g ai A< ax B (76) 

where 

a W := {a'^y 1 (77) 
is the inverse of the hopping metric. The hopping metric is guaranteed to have 
only positive entries (less than or equal to one). Its inverse, however, can have 
negative entries. [In quantum theory the inverse of the hopping metric does 
have negative entries. Indeed, this can be regarded as the origin of the negative 
numbers appearing in convex probabilities frameworks. These negative numbers 
have conceptual implications [5^1 HI] •] 
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6.5 Diagrammatic representation of the simple circuit 

We can represent (fTTj) as 



a a 

[A] ^ 0-o^ ^ A " S ^ QlXai (?8) 
The elements of this equation are 
a 

,X ai \~X\-o a ai A (79) 

Note that, again, we have a black dot on the fiducial element and we match 
black and white dots. We define 




d d 



(80) 



Once again, this is not ambiguous at this stage since the fiducial preparation 
must have a black dot and so the A box must have a white dot. 
The hopping metric is represented by 



A" 



«- a'/ 11 :=p(«iX ai X£) (81) 



and its inverse is represented by o— o. 



(82) 



With these diagrammatic conventions we can redo some of the manipulations 
above 



Prob 



= Prob 



a 



= ^)^.^] (83) 



Further, we put 



S- : = 0-° 



«-|~b] : = *-»o-["eT| 



(84) 



With these definitions we can redo the RHS of (|76|) - the probability, Prob(A ai B ai ), 
is equal to 



[TJ^c^B] = = [7]-o»^o-[|] = [7]^.o«c«4b] (85) 
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We can simply represent this as 



A 



B 



(86) 



By virtue of the notational conventions we have adopted, we can introduce pairs 
of black and white dots, or delete them (in pairs) as we wish: 



This implies that both • — o and o — • are equal to the identity since 

and o-o»— • = a-* (88) 



o— o = m—o 



as o — o is the inverse of < 

Recall that, by virtue of the way we have defined equivalence, A = Prob(A) 
for any circuit A (this is equation ([54]) above). This means that 



a 

by virtue of the way that • — • is defined in (|81|) . Hence, we can write 



(89) 



A 



B 



(90) 



We will give a sequence of equivalent diagrams like this when we consider a 
more complicated circuit. 



6.6 Full decomposability 

The second assumption used to set up the duotensor framework is the following 

Full decomposability. We assume that any operation is equivalent 
to a linear combination of operations each of which consists of an 
effect for each input and a preparation for each output. We do not 
lose any generality by choosing these to be fiducial sets since any 
other set could be written as a linear combination of the fiducial 
set. Hence, this assumption is equivalent to the statement that any 
operation, A a *?"] c 6 , can be written as 

A d4e5...fe _ ^-h Aaib2 ... C3 X£X£ • • ■ X* d4 X d * e5 X e5 • ■ • /6 X f6 (91) 
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in symbolic notation, or 



d e f 



a b c 



A 



.4 



d e f 



V 



(92) 



a b 



in diagrammatic notation. 



We will prove in Sec. 19.31 that this assumption is equivalent to the assumption 
of tomographic locality (i.e. P3). 



6.7 What are duotensors? 



The object, d4er ° "^ 6 A ai b 2 ... C3 , in (l9~Tj) above is an example of a duotensor. Dia- 
grammatically , it corresponds to a box with all white dots on it 



ao-\ \-od 

• • " - fl a 1 b 2 ...c 3 



(93) 



since we want to have black dots next to the fiducial elements in (|9"2")l . We can 
put a white dot on a fiducial element 



/\-»o-oa /\ofl 



ao-o»\y 



(94) 



A white dot on a fiducial element therefore corresponds to a sum over fiducial 
elements weighted by the relevent entries in the inverse of the hopping metric. 
With this understanding, we can place black and white dots on the links in 
(|92|) . In this way we can extract a box with inputs and outputs having black 
and white dots. For example 



A 



ai&2 Adr, 
b 3 C4 b 6 c 7 



(95) 



The map between black and white dots and the placement of indices is given by 

IA' (96) 
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Subscripts and pre-subscripts correspond to the inputs on the left of the box. 
Superscripts and pre-superscripts correspond to outputs on the right of the 
box. The object in ([93)1 is tensor-like with a bit more structure, indices can 
appear on the left as well as the right. The reason for this is that there are two 
independently chosen basis sets associated with every index - a fiducial set of 
effects and a fiducial set of preparations. (For tensors we only have one choice of 
basis set associated with each index.) Given this, we will call this mathematical 
object a duotensor. We can put an index on the right or hop it over to the left 
(using the hopping tensor), or vice versa. For example, 



A 



A -o«-«& (97) 



c c 
or, in symbolic form, 

01 Ab 2 ds _ b 2 b' 6 c' aib' 2 Ad 5 /go\ 

bzb/icACr — b'9 b e g 9c 4 bic'^b'cv \ V °> 



Subscripts always correspond to inputs and superscripts always correspond to 
outputs. For the diagrammatic representation, subscripts go on left and super- 
scripts on the right of the boxes. 

If we change to a new set of fiducial effects then we perform a transformation 
that effects the subscripts and superscripts. If we change to a new set of fiducial 
preparations then we perform a transformation that effects the pre-subscripts 
and pre-superscripts. It is shown in Appendix [D] that these transformations 
have the properties one would expect of a tensor-like object. 

We can use the hopping metric to put the indices on the left or the right (or 
correspondingly, change their colour in the diagrammatic representation). It is 
interesting to consider a few possibilities 

I. All white dots corresponds to the weighting in the sum over fiducial ele- 
ments. For example, 



A 

7T 



(99) 



2. All black dots corresponds to the fiducial probabilities (when we place 
fiducial preparations on each input and fiducial effects on each output). 
For example 



Prob 



d 
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This follows by capping the inputs outputs in with fiducial elements. 

3. Standard form is when we have all the indices on the right hand side 
so we have only superscripts and subscripts. In standard form we are 
only invoking the use of fiducial effects (and not fiducial preparations). 
Diagrammatically this corresponds to having all white dots on the left 
and all black dots on the right 



In earlier sections of this work we were, effectively, only using the standard 
form. We can clearly connect duotensors up in standard form without 
using the hopping metric since black dots will always be put next to white 
dots. 

6.8 The identity transformation and the hopping metric 

The hopping metric is, itself, a duotensor. We can put this duotensor in standard 
form. Then we have o — •. This is equal to the identity (as we noted in Sec. I6.5|) . 
Hence, the operation corresponding to this duotensor must effect the identity 
map. This operation is simply given by tagging on the fiducial elements. Hence, 



We could have used •— •, o— -o, or •— o on the LHS with matching colours on 
the dots on the fiducial elements. Interestingly, (I101[) is equivalent to the as- 
sumption of full decomposability (and therefore equivalent to the assumption 
of tomographic locality in P2). This is clear since, by ()101j) . we can insert the 
LHS of (|10ip on the end of every wire coming into or going out of an operation. 
Then we can apply the p(-) function to obtain full decomposability. We have al- 
ready shown that (|101[) follows from full decomposability so we have established 
equivalence. 

6.9 General circuits 

We can apply full decomposability to any circuit to convert the circuit into 
an equivalent duotensor calculation. We will show how to do this by example. 




(100) 




(101) 
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Consider the circuit 



D 




(102) 



A 



Applying full decomposability we obtain the equivalent diagram 




(103) 



which is the same as 




(104) 



We can insert black and white dots in each link (with a black dot next to 
the fiducial element) then, using (|89[) . insert the hopping metric to obtain the 
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equivalent diagram 




(105) 



Note that we are implicitly using the fact that the p( ) function factorizes over 
disjoint circuits. Hence we obtain 





(106) 



where we have canceled over pairs of black and white dots. Using (|64j) . we 
obtain 



Prob 





(107) 



It is striking that the probability for a circuit is given by a duotensor calculation 
that looks the same as the circuit itself. A similar thing will be true if we use 
symbolic notation (putting all the duotensors in standard form). This will 
clearly be true for any circuit. In the diagrammatic case we need only rotate 
the diagram through 90° and change the font from sans serif to normal maths 
font. In the symbolic case we need only change the font. 



6.10 Formalism locality 

In this subsection we will show how we can formulate physical theories that can 
be put into the duotensor framework in a formalism local fashion. So far we have 
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only shown how to calculate probabilities for circuits. In this section we will be 
interested in calculating probabilities for fragments. It is reasonable to assume 
that circuits have probabilities associated with them that are independent of 
what is happening elsewhere (this is Assump 1). However, it is not reasonable 
to assume the same thing of fragments in general since the probability associated 
with a fragment may depend on what is outside the fragment (for example, a 
fragment has open inputs into which some system may be sent). However, in 
special circumstances, a fragment may have a probability that is independent 
(or approximately independent) of what is outside the fragment. 

In Sec. 14.51 we introduced the notion of well conditioned probabilities. We 
can meaningfully speak of a probability Prob(A|B) only if it is well conditioned 
(so that Prob(A|BC) is independent of C where C completes AB into a circuit). 
We now introduce a further notion. 

Well conditioned probability ratio: We will say the probability 
ratio 

Prob(A[i]) 



is well conditioned if 



Prob(A[j]) 
Prob(A[i]C) 



(108) 



Prob(A[j]C) 

is independent of C where C is any fragment which completes A[i] 
(and A [7]) into a circuit. Here A[i] and A [7] correspond to different 
outcome sets for the same setup. 

According to Assump 1 we can associate probabilities with both the numerator 
and denominator of this expression (as they correspond to circuits) so we this 
is a test we can run. Note that, by (|39[) . this equivalent to demanding that 



Prob(A[i] I C) 
Prob(A[j] I C) 

is independent of C (the conditional probabilities in this expression are well 
conditioned since A[i]C and A[j]C are circuits). 

If Prob(A[z]) and Prob(A[j]) are each well conditioned then it follows that 
their ratio is. However, it is possible that, taken separately, they are not well 
conditioned but that the ratio is. If we define probabilities as long-run rela- 
tive frequencies then the probability ratio is equal to the number of times A[i] 
happens divided by the number of times A [7] happens in the long run. 

A special case is where the outcome set Oj for the fragment A [7] is the set of 
all possible outcomes. We denote this A[I]. Since some outcome must happen 
we know that Prob(A[7] | B) = 1 for any B. We can, then, regard the probability 
Prob(A[z] I B) as a probability ratio (using ([38])) 
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Hence, the idea of a probability ratio is more general than that of a conditional 
probability. 

A probability ratio that is not well conditioned is not well defined. Whatever 
value we write down for it could be made to be wrong by an adversary who has 
control over other conditions that would effect the outcome we are looking at. 
Therefore, we cannot expect a physical theory to predict the values of probability 
ratios that are not well conditioned. However, it is reasonable to expect our 
physical theory to tell us whether a probability ratio is well conditioned. 

Our objective is to construct a mathematical framework for theories which 

1. Associates mathematical objects with all fragments. 

2. Provides a mathematical condition for saying whether the probability ratio 

Prob(A[i]) 
Prob(A[j]) 

for any A[i] and A[j] (for the same setup) is well conditioned employing 
only mathematical objects associated with these fragments. 

3. In the case that the probability ratio is well conditioned, provides an 
expression saying what it is equal to employing only mathematical objects 
associated with the fragments A[i] and A[j]. 

If we can do this we will have the formalism locality property. 

We will now see how to achieve this objective in the duotensor framework. 
First we state the result. 



The probability ratio 

Prob(E[i] 



Prob(E[j]) 



(110) 



where E[i] and E[j] are two fragments corresponding to different out- 
come sets for the same setup is 

well conditioned if and only if the corresponding duotensors, E[i] 
and E[j], are proportional, and 

equal to the constant of proportionality k in E[i] — kE[j] (if well 
conditioned) . 

To prove this we note that, for the probability ratio (|1 10[) to be well conditioned, 
we require that 

Prob(E[i]F) 

Prob(E[j]F) 1 1 

be independent of F for any choice of fragment F that completes the circuit. One 
set of fragments for F we can consider are the fragments that consist of simply 
putting fiducial preparations on each input of E[i], and E[j], and fiducial effects 
on each output. This gives us the fiducial probabilities for these two fragments. 
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Let us consider with respect to these choices for F. We saw in Sec. 16.71 

that the entries of the duotensor with all black dots is equal to the fiducial 
probabilities. Hence, for to hold, we require that the elements of E[i] are 

proportional to the corresponding elements of E[j] with the same constant of 
proportionality. Hence the two duotensors must be proportional. (This is clearly 
necessary when these two duotensors are in "all black dots form" but it must 
also be true when they are both in any other given form since multiplication by 
hopping tensors, which are non-singular, will not effect such a proportionality 
relationship). This is a necessary condition but it is also a sufficient condition 
since, clearly, (jllll) is independent of the choice of F (where this completes the 
circuit) when E[i] and E[j] are proportional. Further, when the two duotensors 
are parallel then the probability ratio is simply given by the proportionality 
constant. 

This result is in accordance with our objectives as stated above. In particu- 
lar, note that we have the property of formalism locality since we employ only 
the duotensors associated with the fragments E[i] and E[j] corresponding to the 
set up we are interested in (which might be part of a much bigger set up). 

7 Operators and duotensors 

The duotensor framework was originally built to be applied to circuits and frag- 
ments built out of operations. However, as we will see, it can also be applied 
in a fairly analogous way to objects built out of operators acting on complex 
Hilbert spaces. In this section we present this as a purely mathematical struc- 
ture. Of course, our interest in this structure is that quantum theory fits into 
it very comfortably. The duotensor framework for operators is based on two 
facts (since we are in a mathematical rather than physical setting we have facts 
rather than assumptions). The first fact is that a circuit built from operators 
is equal to a real number which depends only on the details of this circuit (this 
is analogous to Assump 1 that we can associate a probability with a circuit 
that depends only on the details of that circuit). The second is that opera- 
tors are fully decomposable (which is, of course, analogous to the assumption 
that operations are fully decomposable). We will explain these facts more fully 
below. 

7.1 Operators 

We start by introducing types, a, b, .... We have composite types such as aab, 
etc. We may sometimes represent a composite type by a single letter. Next 
we introduce complex Hilbert spaces, H ai , ^b 2 > ■ • - having dimensions i\T a , Nb, 
. . . (the type determines the Hilbert space dimension). We define the space of 
Hermitian operators acting on these Hilbert spaces as V ai , Vt> 2 , ■ • ■ • These spaces 
have dimension iV a , N£ , . . . (we can deduce this simply by counting the number 
of real parameters in a Hermitian matrix). We represent an operator in V ai by 
A ai , B ai . . . We will call this a result operator. We will introduce complex Hilbert 
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spaces, H ai , H b2 , . . . having dimensions N a , Nb, .... These are isomorphic to 
the earlier introduced Hilbert spaces (as they have the same dimensions) but 
we introduce them also with a superscript label to enable us to define certain 
structures later. The space of Hermitian operators on these Hilbert spaces are 

V ai , V b2 , These spaces have dimension iV a 2 , N£, Operators in V ai will 

be written A ai , B ai . . . We will call this a preparation operator. 

We define the Hilbert space H 3l b 2 '■= W ai <8> Hb 2 for composite type ab. 
This has dimension N a b = N a Nb- We can also define the Hilbert space W a 2 = 
V-32 <8> H b2 - This also has dimension iV a b = N a N\,. In general we have 

:= H ai ® H b2 <g> • • • ® H C3 <g> H di ® H es <g> • • • ® -H fe (112) 

The dimension of this space is AT a b...cde...f = -Wa-^b ■ • • N c NdN e . . . Nf. 

We define V ai b 2 as the space of Hermitian operators acting on 7i ai b 2 - I n f a °t 
we have V ai b 2 = V ai <8> Vh 2 (this is true for complex Hilbert spaces but not for 
real Hilbert spaces). The dimension of V ai b 2 is iV 2 b = iV 2 iV 2 . We define V a ^ 2 - f c \ 
as the space of Hermitian operators acting on "%■ We have 

Kbl'il ~ V ^ ® v b 2 ® • • • ® V C3 ® V d4 ® V 65 ® • • • ® V f6 (113) 

Note, again, that this is true when the underling spaces are complex (rather 
than real) Hilbert spaces. We write 



d e 



Jd4e 5 ...f 6 
^ ai b 2 ...c 3 



A 



(114) 



a b 



for an operator in V ai 4 ^ We have given both the symbolic notation (on the 
left) and diagrammatic notation (on the right). We will use the terminology 
of inputs and outputs corresponding to the subscripts and superscripts respec- 
tively. 

We introduce a fiducial operator basis for V ai 



where ai = 1 to K a 



(115) 



These are a spanning set of operators for the space V ai . This means that any 
A ai € V ai can be written 



B ai = B ai X a ^ 



(116) 
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We define 



(117) 



(i.e. we can cancel over black and white dots). As before, the horizontal line 
indicates that we are summing over the associated index (eti in this case). 
Similarly, we introduce a fiducial set of operators for the space V ai 



ai X ai <S=^> a ^J where a a = 1 to K, 
We can write any operator, A ai 6 V ai , as 

i Ql = a i4 ai A ai <(=>> 

We define 



(118) 



(119) 



so we can cancel over black and white dots. 



(120) 



7.2 Tensor product notation 

The standard tensor product symbol, ®, is both redundant and potentially 
obstructive. We will not generally use it. If we have two operators, A 31 and B° 2 
for example, we could write A 31 <g> B° 2 since this is an operator in V ai ® V bl . 
However, we will simply write A 3x B° 2 . The fact that we have a tensor product 
is clear because we have the labels ai and b2 with different integers. We will 
adopt this notation in general. For example, A 3l ® C bl will be written A 3l C° 2 . 
The order is not important, so we could also write this as C b2 A 3l . We could 
have two instances of the same type, for example, A 31 <g) B 32 is written A 3l B 32 . 
We know the latter is a tensor product because we have different integer labels 
as subscripts to the types. 

Note, in particular, A 3l B 32 is a tensor product (it has different integer la- 
bels) whereas A 3l B 3l is not. Expressions like the latter appear later when we 
put wires between operators (we will discuss this in more detail in Sec. I7.4[) . 
The interpretation of A 3l B 31 is that it corresponds to taking the trace after 
the direct multiplication of the operators (if the operators were represented by 
matrices then we would multiply the matrices together). In more conventional 
notation we would write A 3l B 3l as Trace(^4 ai B ai ) making the trace explicit. 
We impose type matching when we put wires between operators (we are then 
guaranteed that spaces, V ai and V ai , are of the same dimension so such mul- 
tiplication is possible). Note that, although these operators do not commute 
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under multiplication, the trace is the same which ever way we multiply them 
together. Hence, we can safely write A 3l B ai = B 3l A 31 . More generally we may 
have something like A 31 Bb 2 C ai D bi . In more conventional notation this is equal 

to Trace(i ai C ai )i?b 2 ® D bl - 

Even more generally we can have expressions such as A 3lbl B^ 3i C* 6 C3a4 where 
we have more than one wire. In this case the wire (or repeated label) indicates 
that we take the partial trace over the corresponding spaces. This means that, 
for example, 

A aib2 B£ ai C** e V 3lb6 (121) 

This is similar to Einstein's summation convention. Here the partial trace is 
implicit where ever we have a repeated index (or wire). We will explain below 
how such expressions can be calculated using full decomposability. We will refer 
to taking the partial trace over all the wires as the circuit trace. 

There is, in general, no ambiguity in dropping the tensor product symbol, 
<£>, as the integer labels on the type symbols carry all the necessary information. 
Keeping the tensor symbol, on the other hand, would require that we took care 
to keep the symbols in the right order and will often require padding expressions 
with the identity. These requirements would add unnecessary complications in 
what follows and would go very much against the spirit of the circuit framework 
(where circuits are interpreted graphically). In particular, the usual notation 
with the tensor product symbol requires that we foliate the circuit first. A 
foliation is additional structure put in by hand. 

7.3 Operators are fully decomposable 

One of the beautiful features of complex Hilbert spaces is that operators acting 
on them (i.e. in the space V) are fully decomposable. This fact follows from 
(|113|) . This is not true of operators acting on real Hilbert spaces. 

Full decomposability of operators. It is a fact that any operator 
is equal to a linear combination of operations each of which consists 
of an result operator for each input and a preparation operator for 
each output. We do not lose any generality by choosing these to be 
fiducial sets since any other set could be written as a linear combi- 
nation of the fiducial set. Hence, this assumption is equivalent to 
the statement that any operator, A a ^'"J , can be written as 

^..U = d*e 5 ...f 6AaiH cs x£x% . . . X ^ di X d \^ ■ ■ ■ h X^ 

(122) 
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in symbolic notation, or 



d e f 



d e f 



A 



A 



(123) 



a b c 

a b c 

in diagrammatic notation. 

Note that we have equality here since the space under consideration, '" f * 3 , 
is a linear space. In the case of operations in the previous section we had 
equivalence rather than equality for the assumption of full decomposability. We 
will develop a notion of equivalence for operators in the duotensor framework. 
Things look a little bit different than for operations. However, this makes no 
difference in the end. The analogy between the duotensor framework for oper- 
ations and the duotensor framework for operators is good but not perfect. 



7.4 Wires, fragments, and circuits 

We have added a certain structure to the operators we have considered. Namely, 
that they have inputs and outputs labeled by types. We will now see how this 
structure enables us to wire these operators together to build operators with 
even more structure. We can use a wire to join an output to an input of the 
same type. Consider, first, a simple example 




(124) 



The wire here means that we take the trace of the product of A ai e V ai and 
B 3l £ V ai . I.e. this is equal to Trace( A ai B ai ) in more standard notation. Note 
that the trace of the product of two Hermitean operators is always real. 

If we have a more complicated situation then the best way to understand 
what it means to place a wire is to use the full decomposability of the operators. 
Thus, we can write 

4 b fb 4 2 4 d 5 6 b3 = A b a °£ ±tl±ll b3 X b i Ci X« B*£ Xgxg de X d * C7 X* (125) 
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or, in diagrammatic notation, 



d 




(126) 



b 



This operator is in the space 

^gast = V ai ® V b2 ® V a5 ® V C * ® V d6 ® V* (127) 

Note that we have taken the trace over the b3 space. 

An operator fragment is the object resulting from wiring a bunch of operators 
together. This will be a Hermitean operator. We may have disjoint parts. We 
may have some open inputs and outputs. 

An operator circuit is the object resulting from wiring a bunch of operators 
together such that we have no open inputs or outputs left over. This will be 
equal to a real number. Since there are no open inputs or outputs it is not 
possible to join an operator circuit to an operator fragment with wires. 

7.5 Evolution 

If we have appropriate causal structure in the wiring - namely no closed loops - 
then we can impose an evolution picture on this operation structure at the level 
of equivalent operators. 
The box 



b 




(128) 



a 



corresponds to an operator on V ai ® V bl . However, we can use it to obtain a 
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map acting on operators in V a and producing operators in V b . Consider 



b 




(129) 



This is an operator in V b2 . We define the map $#[•] by 

$s[i ai ] :=A ai B£ (130) 

This is a linear map from V ai to V b2 . We call this a superoperator . 

If we wish to think in the standard way of having a state that evolves in 
time we can. We break the operator circuit up into fragments along some folia- 
tion lines and think of an initial state, represented by an operator, as evolving 
through the circuit. If we evolve using superoperators then we will get an opera- 
tor at each stage that is equal to the operator corresponding to the accumulated 
preparation fragment up to the given foliation line. It is important to note that 
we do not need to think in terms of a time evolving state. In particular, if we 
wish to calculate what some given operator circuit is equal to we can break it 
up into fragments in any way we wish. If we know the operator fragments then 
we can combine them (using the implicit circuit trace) and obtain the value of 
the operator circuit. 

We can, by this method, also define an evolution that evolves an operator 
"backwards" from output to input. Thus, define the map $#[•] by 

i B [C h2 ]:=C b2 B*l (131) 

This is a linear map from Vt> 2 to V ai . 

The existence of the maps %b and %b does not imply that these maps are 
invertible. 



7.6 The operator hoping metric 

If we obtain duotensors from operator structures then the hopping metric is 
given by 

^» T a^ 1 ■= a'X^X^ (132) 

and its inverse is represented by o— o. The entries in the hopping metric must 
be real because we are taking the trace of the product of Hermitian operators. 
The inverse will, therefore, also have real entries. 

We can simplify any expression for an operator fragment by replacing matched 
fiducial pairs (i.e. those joined by a wire) by the hopping metric. We can then 
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cancel over black and white dots as we did in the previous section. Consider 
the example in fjl 26[) above. We obtain 



d c 




(133) 



b b 

Associated with the fragment in (|1 33[) is the duotensor 




ao- 



(134) 



This duotensor provides the coefficients for the sum over fiducials. Two operator 
fragments that have the same duotensors after replacing all matched fiducial 
pairs with the hopping metric must be equal. Note that, to determine whether 
two fragments are equal, we must put the corresponding duotensors in the same 
form (for example, all white dots as shown here). 

In the above example there is only one wire. In a more complicated example 
we would have many. However many matched fiduial pairs we replace by the 
hopping metric, we will continue to have equality. This is one way to explicitly 
calculate the circuit trace implicit in the wiring for a general operator fragment. 

When we explicitly calculate the circuit trace implicit in the wiring of an 
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involves taking the trace over the entire space. Note that when we have more 
than one wire we are implicitly using the mathematical fact that, in conventional 
notation, Tvace(A <g> B) = Trace(A)Trace(B). 

8 Operations and operators 

In this section we will discuss four related topics. First we will discuss how to set 
up a correspondence between operations and operators such that the probability 
for a circuits is given by the corresponding operator circuit. Second, we will 
provide some mathematical definitions and theorems for the case of operators 
in the duotensor framework that are motivated by this correspondence. Third, 
we will show how to formulate quantum theory as a theory relating operations 
and operators within the duotensor framework. Finally we will discuss how to 
formulate quantum theory in a formalism local fashion. 

8.1 Correspondence 

In this section we will see how to use duotensors to link operations and operators. 

Operation-operator correspondence. We will say that opera- 
tions correspond to operators if there is a mapping from operations, 
^aib2---c ' t° operators, ^a?b2---c ' sucn that the probability for any cir- 
cuit comprised of operations is equal to the operator circuit obtained 
under this mapping. 

It is important that, under this correspondence mapping, we have the same 
input-output structure. If operations correspond to operators then, for example, 

Prob(A aib2 B c b f C aiC3a J = i aib2 B=f C aiC3a4 (136) 
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This same example in diagrammatic form 




(137) 



We will now prove 

Tl If we can associate fiducial result and preparation operators with 
fiducial result and preparation operations, 

X£++X£ and ai X 31 O ai X ai for ai = 1 to K a , for all types a 
such that 

ai X ai X^ =Prob( ai X ai X^) (138) 

then we can set up a correspondence from operations to operators 
such that the operation 

A d.ea...fa = <^-/^ aii)2 ... C3 X£X£ ■ • • Xg di X d \ r X* ■ • ■ /6 X fe (139) 

corresponds to the operator 

AiZ:Z = die5 - h A aib2 ... C3 X£Xg ■ ■ ■ X^ d4 X d « e5 X* ■ ■ ■ f6 X f * 

(140) 

This defines a correspondence map from operations to operators. 

Note that (|138l) is saying that the hopping metric for operations is the same as 
the hopping metric for operators. If we have equal hopping metrics then, for 
example, both sides of (|137|) are equal to 




(141) 



and therefore equal to each other. A similar result clearly follows for any circuit 
and hence the map in TlTI does induce a correspondence from operations to 
operators. 

A second useful result follows from this 

T2 If we have correspondence between operators and operations 
with respect to one association of fiducial operations with fiducial 
operators then we will have correspondence with respect to any other 
association of fiducial operations with corresponding fiducial opera- 
tors. 

Since we have said that the second association of fiducial operations is with 
corresponding fiducial operators (i.e. as prescribed by (|139[) and (|140p ) we must 
have equal hopping metrics. Hence, the result follows from T[TJ 
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8.2 Physical operators 

In this subsection we provide some mathematical definitions and prove a few 
simple mathematical theorems for the operator-duotensor framework that are 
motivated by the physical considerations of the previous section. We will apply 
these mathematical definitions to give a succinct statement of quantum theory 
in the next subsection. 

A correspondence from operations to operators gives rise a subset of op- 
erators, Og, for each system type pair, (a,b). This subset consists of all the 
operators for which there is a corresponding operation for this given type pair. 
Note that a and b may be composite. We will call the collection of these subsets, 
{Og : all types a, b}, an operator superset. 

We define 

An operator superset is physical if 

1. The value of any operator circuit formed from operators in the 
operator superset is between and 1. 

2. The operator superset contains preparation and result opera- 
tors equal to all rank one projectors for every type. 

3. The operator superset contains result operators corresponding 
to the identity operator, I ai , for every type. 

This is a purely mathematical definition. It is, however, motivated by physical 
considerations. The motivation for the first property is so that the value of 
the operator circuit can be equal to a probability. The motivation for the 
second property comes from quantum theory. In quantum theory, pure states 
and maximal effects arc represented by rank one projection operators. The 
motivation for the third property also comes from quantum theory. The identity 
operator, I 3l corresponds to the deterministic result T ai (where the outcome set 
consists of all outcomes on the associated measurement). 

Physical operators. An operator, 

(142) 

is said to be physical if 

< ^ aib2 - C3S7 4 d 1 t.'.'.c 6 3^ 4 e 5 ... f6g7 (143) 

and 

i aib2 - C3S7 4 d 1 t 5 :::c 3 4e,..f 6g7 < 1 (m 

for all rank one projection operators A aib2 " X3g7 and Cd 4 e 5 ...f 6 g 7 and 
for all types g. 

For emphasis, we have written our general operator, B a ^" c f , with the possible 
composite nature of the input (ab . . . c) and the output (de . . . f ) shown explicitly. 
We could equally write a general operator as where it is understood that a 
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£>d 4 e 5 ...f 6 



and b may be composite types. Note, incidently, that the type g in the above 
definition also may be composite. 
We prove the following theorem 

T3 Physical preparation operators, B ai , are positive and have trace 
less than or equal to one. Physical result operators, B ai are positive 
and less than or equal to the identity, I 3l . 

An arbitrary physical preparation operator, B ai , must satisfy 

< A g2 B ai C aig2 (145) 

for all rank one projectors A g2 and C aig2 . In particular, we can choose C aig2 = 
D ai E g2 where A g2 E g2 — 1 and D ai is an arbitrary rank one projector. Therefore, 

< B 3l D ai (146) 

for all rank one projectors, D 3l . Hence, B ai must be positive. That it must have 
trace less than or equal to one follows from (| 144|) and the fact that I aig2 = I ai I g2 - 
This gives 

A S2 B ai i a J g2 < 1 (147) 

which gives 

B ai I ai < 1 (148) 

Hence, B ai has trace less than or equal to one. Now consider an arbitrary 
physical result operator, B ai . By definition, we have 

< A aiSl B ai C g2 (149) 

for all rank one projectors, A aig2 and C g2 . We can choose A aiS2 = D ai E S2 where 
Tr&ce(D g2 C g2 ) = 1. Hence, 

< B ai D ai (150) 
Hence, B ai must be positive. From (|144|) we have 

A aiS2 B a J g2 < 1 (151) 

for all A ai equal to rank one projectors. In particular, we can choose A aig2 = 
F ai G 52 where both F ai and G g2 are rank one projectors. It then follows that 

F ai B ai < 1 (152) 

for all rank one projectors, F ai . This means that I ai — B ai is positive or, equiv- 
alently, that B ai < I ai . This proves T[3] 

For operators with an input and an output we can prove the following 

T4 The superoperator $ B (-), given by $ B (E ai ) = E ai B\\, is a com- 
pletely positive trace non-increasing map if and only if the operator 
B a 2 is physical. 
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First we note that 

A aig3 B% = $ B ® I(A ai93 ) (153) 
where $b ® I acts as $b on V ai and as the identity on V g3 . We note that 

A 3 ^B^C h2g3 = Trace($ B <g> /(i^ 3 )^) (154) 

(in an obvious but slightly ad hoc notation). Assume that B^ is physical. 
Hence, it follows from (|143l) that 

< Trace($ B ® 7(i aig3 )C* b2 g 3 ) (155) 

for all rank one projectors A 3lgl and C^gj. Hence, Sb(-) is completely positive. 
It follows from flTH|) that 

Trace($ B ® /(i aig3 )ib 2g3 ) < 1 (156) 

for all rank one projectors, ^4 aig3 . Consider A aig3 = E 3l F gl where E ai and F g3 
are rank any one projectors. Since we have Ib 2 g 3 — Ib 2 Igi we have 

Trace($ B (£ ai )4 2 ) < 1 (157) 

Rank one projectors have trace equal to one and they span the space of operators 
in V ai . Hence $b(-) is a completely positive trace non-increasing function if B^ 
is physical. Using (|154[) in a similar way we easily obtain the result that B^ 
is physical if $#(•) is a completely positive trace non-increasing function. This 
proves T[4j 

We can now prove the following theorem. 

T5 An operator superset is physical if and only if every operator in 
it is physical (and it contains preparation and result operators equal 
to all rank one projectors and result operators equal to the identity 
for every type). 

The parenthetical remark is necessary because of the way physical operator 
supersets are defined. The "only if part follows immediately because the con- 
ditions (|143[ I144[) are imposed if the operator superset is physical. To prove 
that if an operator superset has has only physical operators it must be a phys- 
ical operator superset we need to prove that any operator circuit built out of 
physical operators will be equal to a number between zero and one. To this end 
we note that any operator circuit containing B^^ " c 6 can be written in the form 

^ aib2 - C3g7 4 d 1 t.:.c 6 3^d 4 e 5 ...f 6g7 (158) 

for some ancillary system gj which may be composite. The operator £) a i b 2 - c 3g? 
must be positive and have trace less than or equal to one (by T[3]). Hence it 
can be written as a convex sum of rank one projectors. By Tl3l the operator 
^d 4 e 5 ...f 6 g7 mus t kg positive. Hence it can be written as a sum of rank one 
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projectors weighted by positive numbers. Therefore, it follows from the fact 
that B^ 5 2 " c 6 is physical that (| 158[) has trace between zero and one. This proves 
T[5] This theorem is useful in that it allows us to characterize physical operator 
supersets by a condition on each of the elements. 

Recall from Sec. 13.21 that a complete set of operations is a set of operations 
associated with the same apparatus use having a given apparatus setting having 
disjoint outcome sets whose union is the set of all outcomes. Motivated by this 
we define 

A complete set of physical operators, {B^l "c W : ^ = 1 t° L}, 

is set for which each operator is physical and, further, 

L 

E4 d 1 t.:.c 6 3 W4e 5 ...fe-4b 2 ...C3 (159) 

1=1 

The condition (|159l) is equivalent to the requirement that 

L 

£ Aaib2 - C3 B d a Z:X fflw.fe = i (i6o) 

for all rank one projectors, A aib2 --- Cl . This condition is motivated by the physical 
constraint that the sum of probabilities over all outcomes should add to one. We 
note that all physical operators belong to at least one complete set of physical 
operators. 

Sometimes we will be interested in using completely positive maps instead 
of the operators described here. For this purpose we note the following 

T6 The operators {-BI^Z] : I = 1 to L} are a complete set of physical 
operators if and only if the superoperators in the set {$^(-) : I = 
1 to L}, given by $ l B (E 31 ) = E 3l B^[l], are each completely positive 
and their sum, J^i is trace preserving. 

This follows immediately from Tj4] and the fact that the condition (|160j) is 
equivalent to the requirement that $' B (-) be trace preserving. 

8.3 Positivity of operators under input transpose 

The physical motivation for considering physical operators (and complete sets 
of these) is clear. However, it is not so clear how to recognize whether a par- 
ticular operator is physical (and whether a set of these is complete) without 
exhaustively checking the conditions for all rank one projectors and all auxiliary 
systems g. In fact they have a very simple characterization as we will now see. 
This characterization is inspired by the use of the Choi-Jamiolkowski operator 
by Chiribella, D'Ariano, and Perinotti in their "quantum combs" approach. We 
will review their approach in Sec. 18.51 Rather than using the Choi-Jamiolkowski 
operator, we will stick with B^. We show that the partial transpose over the 
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input space must be positive by employing a theorem (T[8] below) adapted from 
a similar theorem in the work of Aharonov, Popescu, Tollaksen, and Vaidman 
[5] (these authors considered vectors in the Hilbert space rather than V). 

We will be interested in the partial transpose of operators. To define a 
transpose we need to work in a given basis for the underlying Hilbert space (to 
see that the transpose depends on the basis note that, if the basis is such that 
the matrix is diagonalized, taking the transpose leaves the matrix unaffected). 
Thus, when we take the transpose over a space, we will work in some standard 
basis. Fortunately, the results we obtain will not depend on which basis we 
choose. We fix a basis for each system type. We fix the same basis in H a and 
V. 3 . We define the input transpose of an operator B\\ € V ai ® V bl to be the 
partial transpose of B^ over the input space V ai in the standard basis. We will 
denote the input transpose of B^ by B a j. The output transpose is defined to 
be the partial transpose over the output space V b2 in the standard basis. We 
denote the output transpose of B^ by B^l ■ The input transpose of C"^, can 
be written as C d j, T T or Cf 4 , 1T . We can also define objects such as D di % 

ajb-jc^ [aib 2 c 3 ]' ■> aib.Jc 3 

where we take the partial transpose of some of the input spaces and some of the 
output spaces. We note that [-D^t ] T = ^aTb2c T anc ^ ^hat P os itivity of 

~ d T e5 

is equivalent to positivity of -D a T b t< We will prove 

T7 An operator fragment is unchanged if we take the partial trans- 
pose over spaces corresponding to any subset of the matched wires. 
For example, 

^c 3 d4C 7 o ai d 6 = ^cJd4C 7B a[d 6 , , 

aib2 a5C3C7 ai b2 acrlr-, 



C3 



a5C 3 C7 



To prove this we can expand out each operator into its fully decomposed form. 
We can then consider matched fiducial pairs such as ai X ai with . Since we 
are taking the circuit trace, we are taking the trace over such matched pairs. 
Hence, we have 

ai X ai x£ (162) 

entering into the expression. This is just the hopping metric. It is easy to show 
that 

ai X 3l X^ == ai j&£$ (163) 

using well known matrix properties (recall that it is implicit in the notation 
that we are taking the trace) . This means the hopping metric is invariant under 
taking the transpose over the space associated with the given system. Hence, 
TO follows. 
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Consider the operator fragment 



A aig3 C b 2S3 



(164) 



where both A aiSe and C*b 2g3 are rank one projectors. We have 



A aie3 C h2g3 € V ai ®V b2 



(165) 



We will prove 



T8 There exist rank one projectors A aiSs and Cb 2g3 such that the 
input transpose of A aiS3 Cb 2g3 is equal to any rank one projector in 
V ai <g) Vt> 2 - A similar result is true for the output transpose. These 
results are true for any choice of standard basis for taking the trans- 
pose. 

First we need to develop a little notation. We use the notation developed in [2] . 
Any rank one preparation projector can be written 



D 31 = \D ai )(D ai \ G 



where 



\D a 



fa 



and 



fa 1 ® fa 1 



(D ai \ G fa 1 



These Hilbert spaces have dimension N a . We have 

A aig3 Ct, 2g3 = (^4 aig3 |Cb 2g3 )(Cb 2 g3|^4 aig3 ) 



To understand the RHS note that 



(C b2 



Hb 2 



and (A aig3 |C b2 g 3 ) G n b2 



fa 1 



(166) 
(167) 

(168) 
(169) 



because, in each case, we take the inner product over the space corresponding 
to g3. Hence, the RHS of (|168|) is proportional to a rank one projector. We will 
now see that, by appropriate choice of |yl aig3 ) and (Cb 2g3 |, (Cb 2g3 |^4 aig3 ) can be 
proportional to any vector in ^H ai ® ltb 2 - A general vector in n ai <S> 7?b 2 can be 
written 

^E^n]) ® (V b2 [n]\ (170) 

n 

where |-E ai [n]) is some set of vectors (not necessarily normalized or orthogonal) 
in T~L ai and (Vb 2 [n]| is the orthonormal basis in "Hb 2 with respect to which we 
will take the transpose. We choose 

|A a ' g3 )=^|^[n])|M/ g3 M) and (C b2g3 | - £<^ 2 MI(^ S3 [ n] | (m) 



67 



where |W^ g3 [n]) is an orthonormal basis in rt S3 and we choose g such that N g > 
N a . This choice immediately gives fllTOj) . Hence, we can have 

i a ^C b2g3 = |£ ai M> ® <H 2 NM ® (^E^HI ® |H 2 [m]>J (172) 
the input transpose of this (in the |Vb 2 [n]) basis) is 

i a ^C b j g3 = \E^[n]) ® \VM)J ® ^(^Hl ® (H 2 [m]|j (173) 
As long as 

£<£ a >]|£ ai M) = l (174) 

the operator in (|173[) is a rank one projector in V ai <g>Vb 2 - Further, by appropriate 
choice of the |-E ai [n])'s, we can obtain any rank one projector in this way. The 
transpose of a projection operator is also a projection operator. Hence 

A a ^(7 b2g3 (175) 

is also a projection operator. This proves T[8j 

We can now prove the following important theorem 

T9 Physicality. An operator, B b *, is physical if and only if (a) its 

input transpose is positive (0 < B b $) and (b) it satisfies 

a i 

b^X < 4 ( 176 ) 

The first point follows from T[7] and T[H By T[7] we note that 

i aig3 4 t ?4 2g3 = B b a UA^C b2g3 ) (177) 

For B b l to be physical we require < A^B^C^. By ([T77l) and T[8]we see 

that this is equivalent to requiring that B b j is positive. To prove the second 

a i 

point, first note that /b 2 g 3 = ^b 2 ^g 3 - Hence 

I aig3 4%g3 = £ a (4 b2 4) (its) 

where E ai = A aiSl I g3 . Since we can write A aiSl as a product of rank one pro- 
jectors, E ai can be equal to any rank one projector. It follows that either side 
of (|178[) is less than or equal to one if and only if (|176[) is satisfied. This proves 

na 

Theorem Tl9l is important because it means that can we define physical 
operators in the following way (this being equivalent to the previous definition) . 
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Physical operators. An operator, B^, is said to be physical if its 
input transpose, B b j, is positive and it satisfies 

< 4 (179) 

While the physical motivation for the former definition was clearer, it is easier 
to check whether a particular operator is physical with this definition. Note 

• We could equivalently have stated that the output transpose must be 
positive. 

• We can use any basis for taking the partial transpose. Given that this 
condition is both necessary and sufficient when used with respect to any 
given standard basis, it follows that positivity of the input transpose is 
independent of which standard basis we adopt. 

• There are two distinct interesting things happening here that single out 
time from space. Time is represented here by the fact that we have an 
input-output structure. Space is represented by the fact that we can have 
composite systems. 

— Consider a general operator such as . The positivity of D^^t 

~ e T f T e T 

(or equivalently of -D a j b 5 2C3 6 ) implies that time is different from space. 
This is similar to the fact that we have a different sign associated 
with the time coordinate in the Minkowski metric. This analogy is 
quite strong because, apart from this difference (the partial trans- 
pose, or the minus sign) we otherwise treat space and time on the 
same footing. While this fact shows that time is distinct from space, 
it does not impose any time asymmetry. 

— The second requirement, given in (|179[) . is time asymmetric. It indi- 
cates that the future does not influence the past. We do not, however, 
wish to assume that the past does not influence the future. However, 
in a fully time symmetric formulation of quantum theory, we would 
wish to treat the past and the future on the same footing. 

At this stage, it should be pointed out, we are just exhibiting some inter- 
esting mathematics. These remarks concerning time will become relevant 
to the physical situation in Sec. 18.41 when we show how to use these math- 
ematical results to reformulate quantum theory. 

We can give the following definition for a complete set of physical operators 
which is equivalent to the (slightly different) definition we gave earlier. 

A complete set of physical operators, {-BMi] : I = 1 to L}, is 

a set for which the input transpose of every operator is positive and 

L 

£s£p]4 = 4 (180) 
i=i 
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Note that it follows from the fact that -B!^] has positive input transpose that 
B^[l]Ib 2 is positive (using T[7]and the fact that I b j — Ib 2 ). It follows that each 
operator in a complete set of physical operators (by this new definition) satisfies 
B^[l]Ib 2 < la^ Hence, the operators in a complete set of physical operators are 
physical. 

It is clear that operator fragments must also have positive input transpose 
and satisfy (jl T6[) as they can be put in transformation mode so that they are, 
effectively, operators. However, there will, in general, be further constraints on 
general operator fragments coming from the shape of the circuit. The analogous 
issue in the quantum combs framework has been considered in |12) . 

8.4 Two mathematical axioms for quantum theory 

Using the definitions and theorems of the previous two subsections, we can give 
the following statement of quantum theory. 

QUANTUM THEORY. The following two mathematical axioms 
specify quantum theory: 

Axiom 1 Operations correspond to operators. 

Axiom 2 Every complete set of physical operators corresponds to 
a complete set of operations. 

The operators here are understood to act on a complex Hilbert space. 

We could replace Axiom 1 by the requirement for every system type we can 
associate a fiducial set of operators with a fiducial set of operations such that we 
get equal hopping metrics. Axiom 1 as given then follows as a consequence of 
TfTI This restatement is mathematically simpler but verbally more demanding. 
We could combine the two axioms into the more pithy statement: 

QUANTUM THEORY: Every complete set of positive operators 
corresponds to a complete set of operations and vice versa. 

The "vice versa" part of this statement here is a little stronger than Axiom 1 
above. We will stick with the two axioms version in this paper. This provides 
a rather succinct statement of quantum theory. We will unpack it into a more 
familiar form. We note that these axioms imply the following statements 

1. The trace formula follows. The definition of the word "corresponds" im- 
plies that the probability for a circuit is equal to the operator circuit (in 
which the trace is implicit). 

2. Operations correspond to operators with positive input transpose having 
B^llb 2 < ^ai ■ If we had a single operation for which this were not true then 
it would follow from Axiom 2 and T[8]that we could form corresponding 
operator circuits having values less than zero or greater than one. Since 
the probability of a circuit is given by this the operator circuit we demand 
that such a trace must be between zero and one. Hence all operations 
must correspond to physical operators. 
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3. All operators having positive input transpose and B^Ib? < lat correspond 
to operations. This follows immediately from Axiom 2 since all such op- 
erators belong to at least one complete set of positive operators. 

4. Preparations correspond to positive operators having trace less than or 
equal to one. To see this we must regard the input as the trivial type 
(having N a = 1). The identity is then just equal to one. This property 
then follows from B° 2 Ib 2 < I ai 

5. All results correspond to positive operators that are less than or equal to I ai . 
To see this we regard the output as the trivial type. The result operator 
must clearly be positive (since the output is trivial). This property then 
follows from having B^I^ 2 < I ai . 

6. The transformation associated with any operation is a completely posi- 
tive trace non-increasing map. Since all operators correspond to physical 
operators this result follows from T|4j 

We have given mathematical axioms for quantum theory above. The real 
objective of this paper is to provide operational natural postulates for quantum 
theory. We will provide such a set of postulates in the next Part. We will 
reconstruct quantum theory in three parts. The first part will set up the basics 
concerning filters and systems. In the second part we will obtain the qubit. 
We will not need the machinery of the duotensor formalism for these first two 
parts. In the third part we will use the duotensor formalism to obtain quantum 
theory as characterized by the above two mathematical axioms. At this stage 
it is worth recalling the results that have to be proven in order to obtain these 
axioms. These are 

1. That we can find an association of fiducial operators with fiducial opera- 
tions such that we have equal hopping metrics for each type. 

2. That we have preparations and results corresponding to all rank one pro- 
jectors for every type. 

3. That we have a result corresponding to the identity operator for each type. 

4. That we can realize a complete set of operations corresponding to every 
complete set of physical operators. 

The first property guarantees, by T[TJ that we have a correspondence between 
operations and operators (and hence we have the trace rule for calculating prob- 
abilities) . The second and third properties actually follow from the fourth prop- 
erty but it is useful to list them separately since they motivate the requirement 
that supersets of operators should be physical. Further, we will derive these 
properties from the postulates before we prove the fourth property. The fourth 
property is simply axiom 2. 
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8.5 Choi-Jamiolkowski isomorphism and quantum combs 



In this subsection we note the strong similarity between the duotensor mediated 
operator formulation of quantum theory given above and the "quantum combs" 
framework due to Chiribella, D'Ariano, and Perinotti (CDP) [T2]. In particular, 
the Choi-Jamiolkowski operator used by CDP is the input transpose of the 
operator used here and the link product used by CDP is equivalent to the circuit 
trace formula used here after some notational translation and the appropriate 
insertion of partial transposes. A precursor to both the link product of CDP 
and the "circuit trace product" used here is the causaloid product as applied to 
quantum theory in [351 I3H]- All three provide a way of finding, in general, the 
mathematical object associated with a fragment from the mathematical objects 
associated with smaller fragments that comprise this bigger fragment. These 
approaches pertain to the general mixed state case (with general transformations 
and general measurements). The multi-time approach of Aharonov, Popescu, 
Tollaksen, and Vaidman [2] and the general boundary approach of Oeckl [ft]] 
can be understood to be doing something similar but are restricted to the pure 
state case. In particular, the notation of [2] was employed in proving T[8] 

First we will review the basic notation and equations of the quantum combs 
approach. CDP take £(H) to be the set of linear operators on the finite di- 
mensional Hilbert space H. The set of linear maps from C(Ho) to €(%%) is 
denoted C(£(%o), C{H\)). A linear map in £(£("Ho), £(%i)) is denoted J£ '. A 
superoperator is an example of such a linear map. There is a one to one cor- 
respondence from linear maps, in £(£(%()), C(T-Li)) to linear operators, M, 
on Z{H\ ® Ho) given by 

M = J?®J? c(Ha) \I Ho )){{I no \ (181) 
where J?c(U) IS the identity map on C{T-L) and 

=£>>!"> (182) 

n 

Here {\n)} is a fixed orthonormal basis for the complex Hilbert space W. The 
map between ^# and M in (|181| is the Choi-Jamiolowski isomorphism. One 
can prove 

1. a linear map ^# is trace-preserving if and only if its Choi-Jamiolowski 
operator satisfies the property 

Ti ni [M]=I no (183) 

where Tr% denotes the partial trace over % and In is the identity operator 
in %. More generally, a trace non- increasing map has Choi-Jamiolowski 
operator satisfying 

Tr Wl [M] < I Ho (184) 

2. A linear map ^# is Hermitian preserving if and only if its Choi-Jamiolowski 
operator, M, is Hermitian. 
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3. A linear map is completely positive if and only if its Choi-Jamiolowski 
operator, M, is positive. 

The link product for finding the Choi-Jamilowski operator associated with the 
composition ,jY o (i.e. two sequential operations) is given by 

N * M := Tr Hl [{In, <8 M T >)(N ® I Ho )] (185) 

Here M € L{Hi®H a ) and N e C{H 2 ® Hi)- The notation M Ti means we take 
the partial transpose of M in the space Hi. The general link product applies to 
the more general case in which we have maps with input and output spaces that 
are tensor products of Hilbert spaces and where these maps are only composed 
through some of these spaces (in the language of this paper this corresponds to 
placing some wires between two general fragments). The general link product 
is given by 

N * M := Tr MnAf [(I MW ® M Tm ^)(N ® I MW )] (186) 

Here M G C(<^> meM n m ) and N e £(<g) < neN U n ) . The set-subscript X refers 
to the Hilbert space ® ieX Hi- 

The tensor product notation is rather cumbersome (as pointed out in Sec. 
17. 2p . An important part of the reformulation in this paper is to provide nota- 
tion that is sympathetic to the graphical structure of operator fragments. In 
particular, with the notation introduced in Sec. 17.21 it is not necessary to pad 
equations with identity operators. We will now translate the equations of CDP 
into the notation of the present paper. Associated with any operator, B^, is 
the Choi-Jamiolkowski operator defined as 

($u ® I) J 3133 = B^J 3133 (187) 

where $b acts on V ai , I acts as the identity on V 33 , and we define 

J aia3 := |J aia3 )(J aia3 | with |J aia3 ) = J2 \U ai M)\U a3 [n]) (188) 

71=1 

It is an easy calculation to show that 

B^j 3 ^=B b2 T (189) 

a 3 

(where the equality is understood to be numerical - strictly speaking this equa- 
tion is illegal since the subscripts and superscripts do not match). Here the 
standard basis with respect to which the input trace is taken is {|£^ ai [n])} as 
used in the definition of J aias . Hence the Choi-Jamiolowski operator is the equal 
to B h *, which we know to be positive for physical operators. Physicality also 

imposes that B^I^ < I 3l (where I 3l is the identity operator on H ai ). Using T[7] 
and the fact that I a r = J ai , we obtain 

B%I b2 < 4 (190) 
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which is the same as (|184[) in the present notation. If we have two operators 
A^l and which are wired together to form 

AtiK ( 191 ) 

The input transpose of this can, using T[T1 be written as 

i b ?B£ (192) 

d l D 2 

which is the same as the link product (|185p in the present notation. More gen- 
erally, if we can have two operator fragments such as A^^ and B^ f * comprising 
the bigger operator fragment 

KtMll (193) 

We can think of this as the "circuit trace product" (as it is implicit in the 
notation that we are taking the circuit trace). Using T[71 the input transpose 
of this can be written as 

^C 3 d 4 £b2f 6 (194) 

which is the same as the general link product defined in (|186[) in the present 
notation. 

The main difference between the approach here and that of CDP is that 
CDP work with an operator that is positive. To compensate for this they 
have to introduce partial transposes into the equation for the link product. 
Here, instead, the basic object is not positive (though its input transpose is). 
The pay-off for working with a non-positive object is that the equation for 
putting operators together (with implicit circuit trace) does not involve taking 
any partial transposes. This does raise the question of which object is more 
natural, . or its input transpose B b j. The argument for B°l being more 
natural is that it is positive. This is a mathematical argument. The argument 
for being more natural is that it arises naturally as a sum over fiducial 
operators (via its fully decomposed form). This is a physical argument since 
these fiducial operators correspond to operations. Added mathematical support 
for adopting B^ as the more natural object comes from the fact that we get 
simpler equations for combining operators. This is particularly the case when 
we have more than two operators to combine. 

There are a few more subtle differences between the quantum combs and 
the duotensor mediated approaches that are worth mentioning here. First, the 
subscripts and superscripts play a deep and essential role in the reformula- 
tion presented in this paper. It is the added structure associated with these 
subscripts and superscripts that distinguish plain old operators from the rather 
more useful operator structures used here and set up the connection with circuits 
comprised of operations. Further, this subscript /superscript notation allows us 
to make taking the partial trace implicit in the notation in a natural way. Such 
a calculus of subscripts and superscripts does not appear in the quantum combs 
approach. It is this calculus (with the accompanying abandonment of the (g> 
symbol) which accounts for the fact that the "circuit trace product" formula 
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appears much simpler than the link product formula. Second, we work with 
spaces, V ai , of Hcrmitian operators rather than with the general spaces, £(H), 
of linear maps as in the Choi- Jamiolowski approach. Such a restriction is moti- 
vated from the beginning by the physics since we wish to use full decomposability 
of operators in analogy to the full decomposability of operations. 

The duotensor framework is a consequence of pursuing the reasoning of the 
causaloid framework [38] (itself motivated by quantum gravity) in the context 
of the circuit model. The quantum combs approach of CDP was motivated by 
thinking about quantum information processing. It is interesting that these two 
routes should lead to similar formulations of quantum theory. CDP have proven 
numerous results in the quantum combs framework. It should be possible to 
take these over into the duotensor operator framework. 

8.6 Formalism locality for quantum theory 

We note that we can write E^ b ^\i] — rE^ b2 \j] (for duotensors) if and only 

if [i] = r E^ ib2 [j]. Hence, it follows from the results of Sec. 16.101 that the 
probability ratio 

Prob(E[i]) 

^oHm (195) 

is well conditioned if and only if £^ b2 [i] = rE^ b2 [j] (i.e. these two operators 
are proportional to each other). If these operators are proportional then the 
probability ratio is given by 

P ™ h ™ = r ( i 96) 
Prob(E[j]) ( yDj 

Clearly this works for operator fragments in general and so we have a formalism 
local formulation of quantum theory. 

We can consider operator ratios such as 

Ef h [*] 

„ aiM J (197) 

We will say that two such operator ratios are equal if they are equal after 
canceling over all scalar factors. Further, we will say that two operator ratios are 
equal if one can be made equal to the other by multiplying one of the objects by 
an object of the form Aj^/Ajj? (this means we can cancel over an operator factor 
that appears in both the numerator and the denominator). In the case that the 
operator ratio is equal to a number we have a well conditioned probability ratio 
that is equal to this number. In the case that the operator ratio is not equivalent 
to a number then the probability ratio is not well conditioned. However, we can 
think of the operator ratio as providing a measure of "how well conditioned" 
the probability is. If the numerator and denominator are almost proportional 
then the probability ratio cannot vary too much. Hence, the operator ratio is 
still quantifying something physically meaningful. This provides an answer to 
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the the question posed by Arkani-Hamed in the introduction to g). Operator 
ratios provide a way to do quantitative physics even when we do not have well 
conditioned probabilities. 

8.7 Disanalogies between operations and operators 

The operator framework was set up by analogy with the operation framework. 
However, there are a few points of disanalogy between operations and operators 
when we come to put them into the duotensor framework. First, as we noted 
already, operations are equivalent to a fully decomposed form whereas operators 
are equal to a fully decomposed form. In general, the notion of equivalence plays 
an important role in the operation framework but plays no role in the operator 
framework. Second, we note that if we are given A ai B b2 for example, it contains 
full information as to what A ai and B b2 are. This expression can be interpreted 
simply as a list of operations. However, if we are provided with A ai B b2 (which 
is the tensor product of the two operators) then we cannot quite get our hands 
on A ai and B b2 because it is not clear which term any overall factors belong to. 
The disanalogy is stronger when we consider circuits because then we actually 
multiply operators together and then take the trace. Given only the trace of 
the product of two operators, such as A 3l C 3l , we cannot return the original 
operators. These disanalogies do not matter for formulating quantum theory 
because, in the end, we are always interested in the case where we use operator 
circuits to calculate probabilities. However, for future applications, it may be 
useful to have a way formulating operators within the duotensor framework that 
is fully analogous to the way operations are formulated. One way to do this is to 
reinterpret an expression like A ai B b2 C 3l as a list. We could expand this out as 
(A ai , B b2 , C ai ). In this list wires are indicated by repeated integer labels on the 
type symbols. We could use then proceed in exact analogy with the operation 
case but defining a t(-) as a linear extension of Trace(-) in the same way that 
the the p(-) function is defined as a linear extension of Prob(-). 
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Part IV 



Operational postulates for 
quantum theory 

In this part of the paper we will present a set of five operational postulates, PI, 
P2, P3, P4', and P5. We then show how to reconstruct classical probability 
theory and quantum theory from them. If P4' is replaced by the stronger 
postulate, P4, then we will see that classical probability theory is ruled out. 
Hence, quantum theory follows from Pl-5. 

In Sec. [S] we give the postulates and discuss each of them in turn. In SecfTUl 
I12l we give the reconstruction. Sec. [10] we extract a number of general properties 
of theories satisfying PI, P2, P3 and P4'. In Sec. [IT] we use P5 in addition to 
the other postulates to obtain the qubit of quantum theory in the non-classical 
case. In Sec. [12] we use the formalism developed in Part IIIII of this paper to 
obtain quantum theory for the general case. 

9 The postulates 

We will prove that, within the circuit framework presented in Part [III the fol- 
lowing postulates are consistent with classical probability theory and quantum 
theory only. 

PI Sharpness. Associated with any given pure state is a unique maximal 
effect giving probability equal to one. This maximal effect docs not give 
probability equal to one for any other pure state. 

P2 Information locality. A maximal measurement on a composite system is 
effected if we perform maximal measurements on each of the components. 

P3 Tomographic locality. The state of a composite system can be determined 
from the statistics collected by making measurements on the components. 

P4' Permutability. There exists a reversible transformation on any system ef- 
fecting any given permutation of any given maximal set of distinguishable 
states for that system. 

P5 Sturdiness. Filters are non- flattening. 

To single out quantum theory it suffices to add anything that is consistent with 
quantum theory and inconsistent with classical probability theory. One way to 
do this is to add the word "compound" to postulate P4': 

P4 Compound permutability. There exists a compound reversible transforma- 
tion on any system effecting any given permutation of any given maximal 
set of distinguishable states for that system. 
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Recall that a compound transformation on as system is one which can be formed 
from two sequential transformations on the system neither of which is equal to 
the identity transformation. Postulates Pl-5 give rise to quantum theory. 

We will discuss the meaning and motivation of each postulate along with 
alternative statements in some cases. 

9.1 PI: Sharpness 

Postulate PI says that there is a one to one correspondence (a bijection) between 
the set of pure states and the set of maximal effects such that when we send 
a pure state onto a measurement having the associated maximal effect (under 
this map) as one of its effects we will certainly get the outcome corresponding 
to this maximal effect. In the case that we follow a pure state by some other 
maximal effect (than the one given under this correspondence) we do not get 
probability equal to one. 

This is true in classical probability theory. Consider a system consisting of a 
ball that can be in one of N boxes. The pure states correspond to the ball being 
in a particular box with probability one. The maximal effects consist looking 
into a particular box. Clearly PI is true here. 

In quantum theory the pure states correspond to rank one projectors. The 
maximal effects also correspond to rank one projectors. The probability is given 
by taking the trace of the product of the state and effect projectors. Postulate 
PI is clearly satisfied. 

Pure states are, in some sense, the most refined states. We can think of them 
as corresponding to the most basic statements we can make about the world. 
Maximal measurements are, in some sense, the most refined measurements. We 
can think of maximal effects as corresponding to the most basic propositions 
about the world we can have. It makes sense then, that there should be a 
unique correspondence between the most basic statements about the world and 
the most basic propositions as given by PI. 

Note that there is a certain time asymmetry in this postulate. Purity is a 
different concept from maximality. This postulate associates purity coming from 
a preparation (the past) with maximality coming from a result (the future). Of 
course we could consider a postulate assuming that pure effects are associated 
with states in a maximal set of distinguishable states but we do not need this 
for the purposes of the reconstruction. 

A number of results follow from PI. We will provide these in Sec. 110. II 
below. In particular, we will see that PI implies that all pure states belong to 
some maximal set of distinguishable states. The number of states in a maximal 
distinguishable set of states, N a , is a constant associated with the system type. 
This imposes the property that distinguishable sets of states that consist only 
of pure states have the same number of elements. This rules out odd shaped 
convex sets of states which do not have this property. We will also see that 
causality follows from PI (that the future does not influence the past). This is 
an important property that is often taken as a background assumption in this 
kind of work. 
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The name, "sharpness" is taken from Wilce's work [73]. Wilce has a similar 
(though not exactly equivalent) postulate. 

9.2 P2: Information locality 

An alternative statement of this postulate is 

P2a For a composite system of type ab composed of systems of types a and b 
we have N ab = N a N b . 

To see this is equivalent we first note that P2 clearly implies P2a by counting. 
To see that P2a implies P2 we note that we can perform maximal measurements 
on each of the components. Such a composite measurement will distinguish at 
least one set of N a N b states (corresponding to preparing distinguishable states 
in the original sets for each of the components). But, by P2a, this must con- 
stitute a maximal set. Hence the measurement on the composite is a maximal 
measurement as stated in P2. 

Since information carrying capacity is defined as log 2 N a for a system of type 
a, we have another alternative statement 

P2b Information carrying capacity is additive for systems made up of compo- 
nents. 

This corresponds very well with our usual intuition. If we have a memory stick 
that carries 2 gigabytes and another that carries 8 gigabytes then, combined, 
they can carry 10 gigabytes of memory. 

We call this property information locality since the amount of global infor- 
mation a system can carry is simply given by adding together the local amounts. 

While this postulate is rather innocent looking, it is very powerful when 
used in conjunction with P4 as we will see. Further, it is possible to imagine 
situations in which P2 is not true |47j . For example, imagine systems of type 
a consist of a die which has a locked door on one side, and systems of type b 
consist of a key having a head on one side and a tails on the other (like a coin). 
The die has N a = 6 and the key has N b = 2. But if we suppose that the key 
unlocks the door on the die (when the two are proximate), and further, inside 
the die is another key then, N ab = 24 rather than 12. 

9.3 P3: Tomographic Locality 

Postulate P3 is often referred to as local tomography and has been much dis- 
cussed in the literature [3j [Jj [75J [56] . We call it tomographic locality to contrast 
it with information locality. We can translate it into mathematical language. 
Let A aib2 be a preparation for a composite system of type ab. The postulate 
says that the probabilities 

Prob(X^X^A aib2 ) A aib2 (198) 

are sufficient to determine the state. Here X" 1 and X? 2 are fiducial results with 
a\ = 1 to K a and 62 = 1 to K b for systems of types a and b respectively. Note 
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1. We need at least a full set of fiducial results at each end since our prepa- 
rations could be the product preparations of the form A ait>2 = B ai C b2 and 
we could complete this preparation into a circuit with a product result, 
D ai Eb 2 . For the resulting circuit 

Prob(B ai C b2 D ai E b2 ) = Prob(B ai D ai )Prob(C b2 E b2 ) = (B ai C b2 ){D ai E b .,) 

(199) 

from the factorization property of Sec. 14.41 There are K 3 K^ linearly in- 
dependent joint states of the form (B ai C b2 ). Therefore we need at least 
K a Kb results to determine the state. These could be the product of the 
fiducial results, X^X^ 2 . Incidently, this is true independently of the as- 
sumption of local tomography. Hence, in general, we have > K a K^- 

2. We do not need more results at each end than the full set of fiducial results, 
X^X^ 2 . To see this assume the contrary. Assume we also need, at least 
one more result, Y ai Yb 2 , consisting an result at each end (in accord with 
P3). Thus, we imagine we also need, at least, the probability 

Prob(Y ai Y b2 A aib2 ) (200) 

But 

Prob(Y ai Y b2 A aib2 ) = y ai Prob(X ai ai Y b2 A aib2 ) 

= y ai r fc2 Prob(X ai ai X b2 fc2 A aib2 ) = Y ai Y b2 A a ^ (201) 

where, to complete the first step we regard the circuit as the result Y ai 
acting on the preparation Y b2 A aib2 , and to complete the second step, we 
regard the circuit in each term as the result Y b2 acting on the preparation 
X ai Ql A aib2 . We see that this additional probability we conjectured needing 
can actually be calculated from the probabilities we already have. Hence, 
we do not need any additional results at either end. 

The reasoning above leads to an alternative formulation of postulate P3. 

P3a For a composite system of type ab composed of systems of types a and b 
we have if ab = K 3 Kb. 

Now, since the probabilities A aib2 determine the state, this object can be 
taken to represent the state. For a result, B aib2 acting on this preparation the 
probability is therefore given by 

Prob(A aib2 F aib2 ) = A a ^B aib2 (202) 

where B aib2 are the coefficients in the sum and represent the effect associated 
with the result B aib2 . For transformations the same reasoning goes through as 
in Sec. 15.91 Hence we have 

Prob(A aib2 G^C C3d4 ) = A a ^G:f 2 F Cadi (203) 
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These results generalize to more than two systems in the obvious way. We see 
that, in this case the probability for the circuit is given by an expression that 
results from changing the sans serif font (inside the argument on the LHS) to 
normal maths font (as seen on the RHS). This works in general as we will now 
prove. Readers who read part Part IIIII have seen this result already for the 
duotensor framework (where the duotensors are in standard form) using full 
decomposability rather than local tomography. 

T10 The probability for a general circuit is given by changing the 
sans serif font in the description of the circuit to normal maths font. 

To prove this consider a general operation (a and c can be composite so this 
is a general operation). The most general circuit containing this operation can 
be put in the form A aib2 B^C C3 t, 2 . Consider performing a product result, Y C3 Zb 2 
on this preparation. Using reasoning similar to that used in (|201l) we see that 

Prob(A aib2 B^Y C3 Z b2 ) = y c3 B^Prob(A aib2 X^Z b2 ) 

= Y C3 BZZ b2 Piob(A^X^X b b 2 2 ) = Y C3 B:\Z b2 A a ^ (204) 

From tomographic locality, probabilities of this form (corresponding to prod- 
uct results) determine the state. Hence, the state associated with the prepa- 
ration A aib2 B^ is A aib2 Bl\. This means that the probability for the circuit 
A aib2 B^C C3 b 2 is given by A aib2 B c a \C C3 b 2 . From this TfTOl follows because we can 
apply this iteratively to replace each operation by the corresponding transfor- 
mation matrix. 

There is another formulation of P3. This is the assumption of full decom- 
posability discussed in Sec. 16.61 fand introduced in f44j). It is immediately clear 
that full decomposability implies tomographic locality since it could be applied 
to the special case where the operation is a preparation. That tomographic 
locality implies full decomposability follows from T I10I and the fact, as seen in 
(|203[) above that we can associate any operation A!^ 5 " c 6 with a transformation 
matrix A^ 4 J^'"^. It follows that the probability for a circuit is linear in this 
matrix. Further, this matrix can be converted into d4e5 '"^M. ai f, 2 ... C3 by applying 
the hopping metric. This corresponds to the duotensor having all white dots. 
These are the coefficients for the expansion of an operation in terms of fidu- 
cials. Hence full decomposability follows from P3. Yet another formulation of 
tomographic locality is afforded by equation (llOlj) as discussed in Sec. 16.81 

In Sec. 15.11 we defined two fragments to be equivalent (A = B) if they gave 
the same probabilities when one is substituted for the other in any circuit. We 
also defined a restricted notion for two fragments to be equivalent (A = B) if 
they given the same probabilities when one is substituted for the other in any 
circuit in which they are restricted to be in transformation mode. We prove 

Til It follows from tomographic locality that A = B if and only if 
A = B for any two fragments A and B. 

This means equivalence under the restricted situation where the fragments are 
in transformation mode implies equivalence in the general situation. Consider 
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fiducial preparations { ai X 31 : a\ = 1 to K 3 } (these are a set of preparations 
whose states constitute a spanning set for the given type). Consider a gen- 
eral fragment B dt f 5 - k . It follows from UTPl that there is a matrix, B di f 5 - fe 

° aiD2...C3 1 1 ' ai»2...C3 

associated with this fragment given multiplying together the matrices corre- 
sponding to the operations that compose this fragment in accordance with the 
given wiring. Consider the fiducial probabilities 

Prob(B^= ; f 6 3Ql X^ b2 X^ . . . C3 X« X£X£ . . . X* ) (205) 

(in the duotensor framework these correspond to duotensors with all black dots). 
The product preparations, ai X ai b 2 X b2 ■ ■ . C3 A C3 , corresponds to a spanning set of 
states for the input aib2 . . . C3. The product results, Xj*X|_ 5 . . . X^ 6 , corresponds 
to a spanning set of effects for the output d4e5 . . . f§. Hence, these fiducial prob- 
abilities in (I205P determine the matrix B a ^ " c a associated with the fragment 
^ai£ c ' Therefore, two fragments are equivalent if and only if they have the 
same fiducial probabilities. Now we note that the circuit in fj205[) is in transfor- 
mation mode so T il II follows. 

9.4 P4': Permutability 

Let {U ai [n] : n = 1 to N 3 } be a maximal distinguishable set of states with 
corresponding maximal measurement {U ai [n] : n — 1 to N a }. Then we have 

Prob(U ai [n]U ai K]) =<W (206) 

Postulate P4 says that, for any given permutation it of the integers n = 1 to 
N a , there exists a reversible transformation, such that 

Prob(U ai [n]P^U a2 K]) - 5„ {nW (207) 

The permutation transformation can be thought of as permutating the states 
corresponding to the preparations. Alternatively, it can be thought of as acting 
on the maximal measurement giving rise to a new maximal measurement with 
correspondingly permuted effects. 

This is a natural requirement and, indeed, a very classical one since it applies 
to states in a maximal distinguishable set. From the point of view of information 
theory we can think of the states in the maximally distinguishable set as letters 
in an alphabet. Then P4' says we can perform perform a lossless arbitrary 
translation of a message encoded with respect to one such alphabet to one 
encoded with respect to any permutation of this alphabet. 

An alternative way of stating P4' is that there exists a reversible transfor- 
mation permuting any pair of states in a maximal distinguishable set of states 
while leaving the other states in the set undisturbed. We could then implement 
a general permutation by many pairwise permutations. 

To single out quantum theory we add the word "compound" to P4' to get 
P4. For the case where N a — 2 the only permutation is the one that swaps 
the states. Hence a compound transformation must consist of two (reversible) 
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transformations each of which do something other than effect the identity or 
swap the states. This forces there to exist at least one more maximal set of 
distinguishable states so we cannot be in the classical situation. That such 
transformations should be compound is well motivated. In general, reversible 
transformations are implemented by letting the system pass through some field 
or through some piece of matter (such as a piece of glass). If, after passing 
through length L of this field or matter, a reversible permutation is effected, 
then after passing through length I < L some other reversible transformation 
must be effected. Hence, the permutation transformation is compound. The 
transformation after I and the remaining transformation after a further L — I 
of the field or matter. In fact, to a very good approximation, / can be varied 
continuously and so we can argue that the transformation should be continuous. 
However, this stronger requirement is not needed to reconstruct quantum theory 
within the present postulate set (though it is used in [36] ). 

9.5 P5: Sturdiness 

Postulate P5 states that filters are non-flattening. A filtering transformation is 
a pretty dramatic transformation. It completely kills that part of the state that 
is not in the support of the filter. The name "sturdiness" is apt since P5 asserts 
that the states are as sturdy as they can be in the circumstances (i.e. when 
subject to such a dramatic transformation). An instructive metaphor is the 
following. Consider a terraced row of houses numbered 1 through 5. Imagine 
that we suddenly destroy houses 1,4, and 5 with some large mechanical device 
that simply flattens them. If houses 2 and 3 remained intact (even though 
they had been adjoined to now demolished houses) then we would rightly think 
that terraces of houses of this type were as sturdy as they could be in the 
circumstances. 

Quantum theory satisfies P5 as discussed in the prelude and proven in Ap- 
pendix While we may be accustomed to thinking of quantum states as being 
a little bit delicate, in fact they are pretty tough. 

Classical probability theory also satisfies P5 (as is obvious after a little 
thought). 

It is clearly possible for theories to be non-flattening as we have two ex- 
amples. Any theory in which filters did sometimes flatten would clearly be 
impoverished in a certain respect. They would have the property that, in a 
certain sense, filters destroy more information than necessary. 

We conjecture in the postlude that P5 can be replaced by the postulate that 
filters are non-mixing. There is certainly a very close link between the non- 
mixing property and the non-flattening property. With a little help from PI 
we can prove (see T I21l below) that non- flattening transformations are also non- 
mixing (by considering single member non-flat sets). It is proved in Appendix 
El that, in quantum theory, non- mixing transformations are also non-flattening. 
Further, any transformation that is non-mixing must ensure that pure states do 
not end up inside the convex set of states. Flattening transformations are likely 
to have the property that, up on flattening the set of states, they send some 
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pure states to mixed states. However, a proof of this for the case treated in this 
paper is missing. 

10 Filters and systems 

In this section we extract some general properties of theories satisfying PI, P2, 
P3 and P4'. 

10.1 Implications of sharpness 

We will say that an effect identifies a state if we get probability one for the 
circuit comprised of the corresponding result and preparation. Postulate PI 
says that each pure state is identified by one, and only one, maximal effect. 
Further, it says that there is one, and only one, pure state that is identified by 
any given maximal effect. This postulate allows us to prove a number of useful 
theorems. 

T12 Every pure state for a system is a member of at least one 
maximal set of distinguishable states for that system. 

Any pure state is associated with some maximal effect (by PI). Every max- 
imal effect must be associated with at least one (though possibly more than 
one) maximal measurement. There must exist a maximal set of distinguishable 
states distinguished by any such maximal measurement. We can substitute the 
particular element of the maximal set of distinguishable states by the pure state 
that is identified by the associated maximal effect. This gives us (what might 
be) a new maximal set of distinguishable states that the given pure state is a 
member of. This proves T I121 
We also obtain 

T13 Every state in a maximal distinguishable set is pure. 

Assume a given state, U ai [n], in a maximal distinguishable set is mixed. Then 
we can write it as a mixture of distinct pure states. Each pure state in this mix- 
ture must be identified by the same maximal effect, U ai [n]. But this contradicts 
PI. This proves 1 1131 

A simple but useful result is 

T14 The only state that is identified by a given maximal effect is 
the associated pure state. 

We already know, from PI, that this is true for pure states. Assume that some 
(mixed) state other than the associated pure state is identified by the given 
maximal effect. Then it would follow that each pure state in the decomposition 
of this state is also identified by this maximal effect. However, this would imply 
that there is more than one pure state identified by the given maximal effect 
contradicting PI. 

Associated with any maximal measurement is a maximal distinguishable set 
of preparations. In fact we can prove 
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T15 All maximal measurements that distinguish the states in any 
given maximal set of distinguishable preparations are equivalent. 

We know from T ll3l that every state in any given maximal set of distinguishable 
preparations is pure. We know from PI that there is a unique effect associated 
with any pure state. Hence, there is a unique set of effects associated with the 
maximal measurement that distinguishes the given maximal set of distinguish- 
able states. This proves T I15I 

A very useful result follows from PI. 

T16 For any maximal measurement, an outcome can only fire if it 
can fire for some state in the associated maximal set of distinguish- 
able states. 

To see this consider a maximal measurement with outcome sets o[n] associated 
with the effects. Assume that all the outcomes in o[n] have a nonzero probability 
of firing if the corresponding state from the maximal set of distinguishable states 
is sent in. Let o[0] be all the outcomes on the measurement that are not in any 
of the sets o[n]. We can append this set to o[l] to get a new maximal effect 
associated with the first of the distinguishable states. This state must be pure 
(by T ll3j) . Now, we know from PI that there is one and only one maximal 
effect identifying this pure state. It follows that we have the same maximal 
effect whether we append o[0] to o[l] or not. The only way this can be true is 
if the effect associated with o[0] is the null effect - its outcomes never happen. 
This proves TfTfll 

We can now prove the following. 

T17 All pure states can correspond to deterministic preparations. 

Recall that a deterministic preparation is one for which the set of outcomes is 
equal to the set of all possible outcomes. First we will show that we can construct 
a deterministic preparation in a maximal distinguishable set. Let U ai [n] be a 
preparation in a maximal distinguishable set with outcome set Ou [n] . We have 

Prob(U ai [n]U ai [m]) = S nm (208) 

Now let o(j[n] be the full set of possible outcomes on the apparatus use corre- 
sponding to U ai [n]. Let U ai [n] be the corresponding preparation with outcome 
set o'y [n] (rather than oy [n] ) . Then we can show that we must have 

Prob(U a >]U ai [m]) = £„ m (209) 

This is true since appending the extra outcomes to our specification of the 
preparation cannot change the fact that we get probability 1 when we use result 
U ai [n]. Hence, the other results in the maximal measurement must have proba- 
bility zero (as probabilities are non-negative and add up to one over a mutually 
exclusive set of outcomes). Therefore 1)2090 follows. Since every pure state be- 
longs to some maximal set of distinguishable states (by T 112[) any pure state 
can correspond to a deterministic preparation. Hence we have proved T 1171 
We can now use T ll7l to prove the following important result 
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T18 Causality. All deterministic results on a given system type 
are equivalent (i.e. they have the same effect). 

Recall that a deterministic result is one whose set of outcomes is equal to the 
full set of possible outcomes. To prove 1 1181 we note that any state, A ai , can 
be written as a convex sum over some set of pure states, B ai [j] (with j = 1 to 
J) which we can take to correspond to deterministic preparations by T I17I and 
the null state (corresponding to j = 0). 

J J 
A ai = ^2\jB ai \j] whereAj > and ^Aj = 1. (210) 

3=0 j=0 

Let C ai and D a2 be two deterministic results. Then 

J J 
Prob(A ai C ai ) = A a >C ai = J2 X j B<11 bVoi = J2 A J : = 1 ~ A ° ( 2U ) 

3=0 3 = 1 

since B ai [j]C ai = 1 as B ai [j]C ai is a deterministic circuit for j = 1 to J (this is 
where we are using H17[) . For similar reasons, 

Prob(A ai D ai ) = 1 - A (212) 

This is true for any preparation A ai and hence the two deterministic results 
are equivalent. This proves T I18I There is a connection between this proof and 
Lemma 6 of [14] which states that a theory is causal if every state is proportional 
to a state obtained by a deterministic preparation. In particular, this proof 
uses the fact that that every state is proportional to a state associated with a 
deterministic preparation if all pure states can be regarded as corresponding to 
deterministic preparations. 

Consider a circuit. We can partition it into two parts with a synchronous 
set of wires. One way to implement a deterministic result after the synchronous 
set of wires is to ignore the outcomes on the apparatuses after these wires 
(this amounts to having an outcome set consisting of all outcomes for these 
apparatuses). T I18I says that there is a unique deterministic effect. Hence, had 
we a quite different set of apparatuses after the synchronous set of wires, or 
the same apparatuses but with different knob settings, we would have the same 
effect. Hence, we get the same probability. Thus, T I18I implies that there is 
no backward in time influence. The probability associated with the outcomes 
up to any synchronous set of wires that partitions the circuit is, according to 
1 I18[ independent of what choices we make after this set of wires. This justifies 
calling such sets of wires "synchronous". In [H1[T3] (and also, effectively, in[40]1 
the equivalence of deterministic effects is taken as a basic assumption in the 
framework. 
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Another way of reading T I18I is that it implies no-signalling. To see this 
consider 




(213) 

The wire b2 on the left hand side, by itself, constitutes a synchronous set. Hence, 
we can think of operation R as being after operations A ait,2 L ai (recall that such 
diagrams are interpreted graphically so there is no significance to the vertical 
position of the boxes on the page). If we ignore outcomes on the right hand 
side (this is the same thing as taking our set of outcomes to be the full set 
of outcomes), then R is a deterministic result. In this case TJ18I implies the 
knob setting on R does not influence the probability of for the outcome on L. 
More precisely, TJ18I implies no-signalling in circumstances where there is no 
wire. Conversely, it implies that if there is signalling from part of the setup to 
another there must be at least one wire going from the first part to the second 
part. 

We now prove 

T19 Any reversible transformation on a system is equivalent to some 
deterministic transformation. 

What this means is that, if the outcome set associated with the reversible trans- 
formation does not include all outcomes, then the outcomes which are not in- 
cluded can never happen and, consequently, they can be included in the outcome 
set and give rise to an equivalent transformation. Let A aib2 be a deterministic 
preparation and let T ai b 2 be a deterministic result. Then Prob(A aib2 T ai b 2 ) = 1. 
Let B a J be a reversible transformation on a system of type a. Let B a * be the 

inverse (so that B a J B a * is the identity transformation) . Let B a * be the transfor- 
mation associated with the same setup as B a * but with an outcome set equal 
to the set of all outcomes associated with this setup. Finally, let B a J be the 
transformation associated with the same setup as B a J but with an outcome set 
equal to the set of all outcomes associated with this setup. Then we can show 
that 
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(214) 



The first step follows since B a jB a * is the identity transformation. The second 
step follows since the probability for this circuit is already equal to one, so 
adding extra outcomes to B|* cannot change anything (these extra outcomes 
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cannot happen). The third step follows since B a * is deterministic and so T a4 b 2 B^ 
must be deterministic. But it follows from T ll8l that all deterministic results on 
a given type of system are equivalent. The last step follow since the probability 
for the circuit is already equal to one so the extra outcomes associated with B^ 
over BgJ cannot happen. Given these equivalences, it follows that the probability 
for the last circuit is also equal to one. Since the afore mentioned extra outcomes 
on BgJ cannot happen, it follows that 



c 








C 




a 


a 






g 


b = 


E 


j] 


b 


a 


a 






A 








A 





for any result C a3 b 2 . If this equivalence is true for deterministic preparations, 
A aib2 , then it must be true for general preparations in place of A aib2 also since the 
states associated with deterministic preparations span the full space of states 
(this follows from T ll7p . With a general transformation in place of A aib2 , it is 
true that any circuit containing the transformation B a J can be put into the form 
of the circuit on the on the left. Hence T il 91 follows. 

We now prove a few results concerning non-flat sets of states. First we note 
that 

T20 The state in any single member non-flat set is parallel to a pure 
state. 

Consider a single member set of states comprised of a mixed state. We can 
write the mixed state as a mixture of distinct pure states. Since these pure 
states are distinct it follows from PI that they cannot all be associated with 
the same maximal effect. Hence, the mixed state must give rise to more than 
one outcome for any maximal measurement (and, in particular, they must be 
the outcomes associated with maximal effects since, according to T I16I other 
outcomes cannot happen). Hence, it can only belong to an informational subset 
having capacity greater than or equal to 2. Since K a > N a , this means that this 
single member set of states is necessarily flat. Hence, 1201 follows. From this it 
follows that 

T21 All non-flattening transformations are non-mixing. 

Consider a single member non-flat set of states. The state in this must be 
parallel to a pure state. If this non-fiat set is sent through a non-flattening 
transformation then it must be non-flat afterwards also. Hence, the state in it 
must continue to be parallel to a pure state. Hence T I2 II follows. This result is 
useful since it means that P5 implies that filters are non-mixing. 
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10.2 Composite of two systems is a system 

Systems are defined to be the thing we have after a filter. A proto-system is 
the thing we have after a "do-nothing" filter (i.e. the identity transformation). 
If we have two proto-systems then they clearly constitute a system since the 
composition of two identity filters in parallel like this is a filter itself. We will 
prove that this is generally true for filters in parallel. We start with two proto- 
systems. We apply filters to each of them so that we now have a composite of 
two systems that are not necessarily proto-systems. We will show 

T22 If we filter two proto-systems then we effect a filter on the 
composite. 

Let F a J be a filter on a proto-system of type a. Let the associated informational 
subset be Sp formed from the maximal measurement {U ai [m] : m = 1 to N a }. 
Let G£* be a filter on a proto-system of type b. Let the associated informational 
subset be Sq formed from the maximal measurement {Vbjn] : n = 1 to N^}. 
It follows from P2 that {U ai [m]Vb 2 [n] : mn — 11,12, N a N^} is a maximal 
measuement on the composite proto-system (of type ab). We define the infor- 
mational subset Sre to be associated with this maximal measurement on the 
composite and outcome set 



O(Sfg) = {mn ■ m e 0(S F ),n G 0(S G )} 



(216) 



(this is just the cartesian product of O(Sf) and O(Sg)). We will now show that 
F^G^ is a filter on the composite proto-system and has 5fg as the associated 
informational subset. We have 



If A aib2 G 5Vg then Prob 
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mn G O(Sfg) ( 217 ) 
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By causality ( U18I) . we must have m <E Sf (for the circuit to have non-zero 
probability) when we perform maximal measurement {U ai [m] : m = 1 to JV a } 
on the left whatever effect we have on the right. And likewise, we must have 
n G Sq when we perform maximal measurment {Vbjn] : n = 1 to Nt,} on 
the right whatever effect we have on the left. We can regard A aib2 Vb 2 [n] as a 
preparation of a system of type a. Further, since m G <SV, this prepares a state 
in Sf which will pass through the filter unchanged. Hence, for any effect C a 



If A aib2 G S*fg then 



V[? 



and n G 0(S G ) (218) 
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where the = sign means that the probability for the two circuits is the same. It 
follows that both A aib2 F a 2 C a2 and A aitl2 C ai prepare a state in S G . Therefore, for 
any effect Db, 



C 



If A aib2 G S FG then 



D 




(219) 



Thus we see that for product effects C a Dt,, the state is uncffcctcd by the presence 
of the filters. But, according to P3, the state is fully characterized by product 
effects. Hence, for any effect B a b 
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If A aib2 e S FG then 
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(220) 



This proves that FG passes states in Sp G . To complete the proof of H22l we need 
to show it blocks states in Sf G . 



If A aib2 G S FG then Prob 
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> mne 0(S fG ) (221) 
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Therefore, 
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This is true for all effects C a and Db. Therefore, using P3, 
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for all effects B a b on the composite. This completes the proof that FG acts as a 
filter on the composite of the two proto-systems. 
We obtain 

T23 A composite of two systems is, itself, a system. 

This follows from from T I22I and the definition of a system. 

This result is important since it means that, so far as composites are con- 
cerned, we can reason about systems that have been obtained by filtering in 
the same way as we reason about proto-systems. In particular, we can redo the 
proof of T I22I for filters on two general systems (not just proto-systems - see the 
definition of a filter on a system given at the end of Sec. 15. 3|) . This gives 

T24 If we filter two systems then we effect a filter on the composite. 

This result follows immediately by redoing the proof of T I22l in the light of T I23I 
This would be relevant if we had a composite formed from two systems that had 
been obtained by filtering and then we consider further filtering on them. 



10.3 Some results for composite systems 

Consider a composite system comprised of systems each prepared in a pure 
state. We will prove the following theorem. 

T25 A composite system where each component is prepared in a 
pure state is, itself, in a pure state 

Consider a two component system with each component prepared in a pure 
state. By T ll2l we know that each of these pure states belong to a maximal dis- 
tinguishable set of pure states for each system taken separately. Consequently, 
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it follows from P3 that the composite of these two pure states is a member a 
maximal distinguishable set of distinguishable states for the composite system. 
It then follows from T I13I that this state is pure. A multipartite system can be 
regarded successively as a set bipartite systems. Thus, a be can be taken as a 
composite of ab and c. Then ab can be taken as a composite of a and b. If a 
and b are prepared in pure states then ab is in a pure state and therefore, if c 
is in a pure state, a be is in a pure state. Similar reasoning goes through for any 
number of systems. Hence T I25I follows. 

We can use the deterministic effect to define a notion of marginals [T3]. Thus, 
if we have a composite system prepared by C ait>2 and we wish to ignore system 
b2 so we just have a preparation for system ai then we can assume we have 
performed the deterministic result, Tb 2 , on b2- This gives us the state C aib2 T b2 
for system ai. We know from U18l that the deterministic effect is unique (it does 
not matter what measurement we have performed on system b2, if we ignore its 
outcomes, we have the same effect). Hence, we can think of C aib2 T b2 as being 
the state of system ai as it does not depend on we do to the other particle. 
Using this concept of the state of a component of a composite system, we have 
the following theorem. 

T26 If one system of a bipartite system is in a pure state then the 
state of the bipartite system is a product state 

It follows from tomographic locality (P3) that we can write the bipartite state 
as C aib2 (as shown in Sec. l9.3p . Consider an arbitrary measurement {Eb 2 [I]} that 
we can perform on system b2 . We have T& 2 = J2i Eb 2 [I] (since the deterministic 
effect corresponds an outcome set containing all outcomes). Hence, 

A ai ._ C a ^T b2 = J2 C aib2 E b2 [I] (225) 
l 

If the state, A ai of system ai is pure then we know that each term in the sum 
must be proportional to A ai . This is because only a mixed state could be a sum 
of distinct states (the convex weighting can be thought of as being absorbed 
already into these terms). Hence, 

C aib2 E b2 [l] = XiA ai (226) 

Let the state of system b 2 be B b2 = T ai C aib2 . We know that A ai T ai = 1 because 
pure states are deterministic by T I171 Hence A/ = B b2 E b2 [l]. Therefore, if in 
addition to having result Eb 2 [Z] on b2, we also have result D ai on system ai, then 
we get 

Prob(C aib2 D ai E b2 [Z]) = (A a 'D ai )(B b2 E b2 [l}) = Prob(A ai D ai )Prob(B b2 E b J]) 

(227) 

We see that the probabilities factorize if one system is pure. This is true for 
any results D ai and Eb 2 [i]. In particular, it would be true for the set of fiducial 
results on each system. Hence, the joint probabilities for fiducial probabilities 
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must factorize. As these fully characterize the state by P3 the state must 
factorize 

C aib2 = A ai B b2 (228) 

This proves Tf26l 
We also have 

T27 If a bipartite system is both (i) in a pure state, and (ii) in a 
product state, then each of the components must be in a pure state 

To prove this, assume the contary. Thus, let the state be A ai B b2 . Assume that 
B b 2 = XC b 2 + (i _ A)Z? b2 where C b2 and D b2 are distinct and < A < 1. Then 
the state of the bipartite system becomes 



XA ai C b2 + (1 - \)A ai D b 



(229) 



which is not pure. This proves T I27I 
We can now prove the following 

T28 In the case where 



Prob 



V[l] 



Wp 



R 


a 




U[ 


m] 



(230) 



where m = 1 to N a and n — 1 to N c > N a , we have 



b c 



V[l] 



(231) 



for any preparation A ai . Furthermore, if R is reversible, then: (i) 
A ai is pure if B C3 is and (ii) B Ci is pure if A ai is. 

First, we notice it follows from (12"3"0)) that {R^° 3 V b2 [1]W C3 [n] : n = 1 to N a } 
is a maximal measurement since it distinguishes the states U a [m]. By 1161 
this means that the results R^ C3 Vb 2 [i]W C3 [n] with i ^ 1 must be null results 
(as the outcomes associated with i ^ 1 can never happen for any state coming 
into the ax input). We must always get i = 1 at the Vb 2 [i] effect. By 1 1181 
this remains true even if we do not put W C3 [n] on the C3 output. By T I14| we 
know that only the state that is identified by V& 2 [1] is V b2 [1] . So the 62 system 
must be pure. The result (|231[) now follows immediately from T I26I Finally, if 
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R is reversible we note the following two points: (i) If A ai is pure and, then it 
follows that the output state from R must be pure. It then follows from U27I 
that B C3 must be pure, (ii) If B C3 is pure then, since V b2 is pure, it follows from 
1 1251 that the output of R is pure. Since R is reversible, the input must be pure 
(a mixed input into a reversible transformation cannot lead to a pure output). 
This proves T f28l 

A slight elaboration on T I28I is of some use. 

T29 In the case where 



Prob 



U[m" 



V[l] W[n 



(232) 



for some given preparation C a , we have 



b c 



V[l] 



(233) 



C 



for any A a . Furthermore, if R is reversible, then B C3 is pure if A ai 
and C ds are. 

We can regard the preparation A ai C d2 as playing the same role as the preparation 
A ai in T f28l Hence T[2"9l follows immediately from Tl28land T f25l 



10.4 Reversible transformations between states 

Postulate P4' says that there exists a reversible transformation effecting any 
permutation of a given maximal set of distinguishable states. We can use the 
postulates to prove a stronger result 

T30 There exists a reversible transformation on a system that takes 
the members of any given maximal set of distinguishable states and 
transforms them into the members of any permutation of any other 
given maximal set of distinguishable states for the system. 

Let {U ai [m| : m = 1 to N 3 } and {V ai [m| : m = 1 to N 3 } be maximal sets of 
distinguishable preparations for a system of type a. Let {W b2 [m] :m=l to N^} 
be a maximal set of preparations for a system of type b. We will assume that 
Nb = N a (so b could be another instance of a). Let {U ai [m] : m = 1 to N 3 }, 
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{U ai [m] : to = 1 to iV a }, and {Wb 2 [m] : m — 1 to iVt,} be the corresponding 
maximal measurements. It follows from P2 that 

{U ai [n]W b2 [m] : nm = 11, 12,..., N a N b } (234) 

is a maximal measurement on the composite system of type ab. Let P^* be a 
permutation transformation on the composite system that effects the permuta- 
tion 

7Tp = (nm -H- mn) (235) 

on the preparations U ai [n]W b2 [to]. This is possible according to P4' (strictly 
P4' says it is the states that are permuted but this implies that the prepara- 
tions are permuted to equivalent ones). Similarly, let Q!^* be a permutation 
transformation on the composite system that effects the permutation 

7tq = (mn O nm) (236) 

on the preparations V ai [m]W b2 [n]. Then the following circuit will implement 
the reversible transformation in T I30I 



a W[l] 

Q 

XT 

b 

p 

a W[l] 

(237) 

The permutation transformation, P, swaps the state of the incoming system 
(of type a) on to the intermediate system (of type b) then the permutation 
transformation, Q, swaps the state onto the outgoing system (of type a) but 
with respect to a new set of distinguishable states. By examining the effects 
of the permutation transformations, it is clear that this transformation will 
transform the state U a [n] into the state V a [n] (we can chose the labeling in the 
V set in any way so this corresponds to an arbitrary permutation). Now we need 
to check that it is a reversible transformation on a. It is clearly a transformation 
on a system of type a since it has a system of type a going in and coming out. 
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To see that it reversible consider following it by the transformation 



a W[l] 

p 

X\ 

b 

%X 
Q 

a |W[1] 

(238) 

where P is the inverse transformation to P and Q is the inverse transformation 
to Q. Now consider sending in an arbitrary state, A ai , into the transformation 
in ([2"3"Tjl . The state entering P is A ai W b2 [1]. The state leaving must, by Tf29l be 
of the form P be U a3 [l]B b4 . The state entering Q is therefore V a5 [l]B bi . Using 
T I29I again, the state leaving Q must be of the form D a<i W bj [1] . Therefore the 
state entering Q is D a6 Vy ba [l] (i.e. the same state as left Q. Since Q is the 
inverse transformation to Q, the state leaving Q is the same as the state that 
entered Q, i.e. V a8 [l]B b4 . Hence, the state entering P is U as [l]B bi . This is the 
same as the state that left P. Since P is the inverse of P the state leaving P is 
the same as the state that entered P, i.e. A a9 W &10 [l]. Hence the state leaving 
the apparatus finally is A aQ which is the same as the state that entered. This 
works for all incoming states. It also works if the incoming system is part of 
a composite system. In this case the state for the composite is in the tensor 
product space (as follows from P3) and so the identity acting on one component 
is equal to the identity acting on the whole. Hence the transformation in (|23T|) 
is reversible. 

We immediately obtain 

T31 Transitivity. There exists a reversible transformation between 
any pair of pure states. 

This follows from T30I and T I12I since any pure state is a member of some 
maximal distinguishable set of states. 
We also obtain 

T32 If we perform reversible transformations on each component 
of a bipartite system then we effect a reversible transformation on 
the composite system. 
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Consider a composite transformaton L^Rjj 2 where both and Rjj* are re- 
versible. If If? and R? 4 are the transformation matrices for these transfor- 
mations then there must exist a transformation with transformation matrices, 
and R£ that are the inverse of and R%* . Consider applying Rjj* to a 
product state, A ai B b2 . The new state will be A ai L c a \ B h2 Rf 2 . This evolution can 
be reversed by applying the inverse transformation on each component. Hence, 
for product states, T I32I is true. However, it follows from P3 that the product 
states form a spanning set for the full set of states (since the product states 
span a vector space of dimension K a K b ). Hence, L^R b 4 is non-singular hav- 
ing inverse equal to L"|i2^ 6 . Since the latter transformation can be physically 
realized 1 1321 follows for all states. 

10.5 Constructing arbitrary niters 

We cannot assume that filters exist. Rather, we have to construct them from 
the postulates. 

We will now prove the following. 

T33 We can construct arbitrary niters. 

This means that we can construct a filter associated with any informational 
subset S defined with respect to any maximal measurement. We will con- 
struct an arbitrary filter on a system of type a. Consider two systems of 
types a and b. Together they form a composite of type ab, which is also 
a system. Let {U ai [n] : n = 1 to N a } be the maximal set of distinguish- 
able preparations for a with respect to which define S for the filter. Let 
{U ai [rt] : n = 1 to N a } be some maximal measurement that distinguishes this set. 
Let {V bl [m] : m — 1 to N^} be any maximal set of distinguishable preparations 
for b. Let {Vbjm] : m = 1 to N^} be the corresponding maximal measurement. 
It follows from P2 that {U ai [n]V bl [m] : nm = 11, 12, . . ., N 3 N b } is a maximal 
set of distinguishable states for the composite system. By P4 there exists a 
reversible permutation transformation, P^*, on the composite system which 
effects the following permutation: 

/ nm <-> mn if n and m € O(S) 
\ nm <-> nm otherwise 

We choose b such that iVb = N a allow such a permutation. Since P^^ i s 
reversible, there exists another transformation, P^^, such that Paj^^a^ i s ^ ne 
identity transformation. Note that although it = ir^ 1 for the above choice of 
TT, it does not follow that P^* is equal to P^^ because the transformation 
acts on K a b (rather than just N a b) real parameters. We choose some particular 
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ri\ e 0(S). We will show that the set of results 



U[m] 

La 

p 

X\ 

b 

p 

(240) 

constitute a maximal measurement that distinguishes the maximal distinguish- 
able set of preparations, {U ai [n] : n = 1 to N a }, we started with. Note that T 
is the deterministic result. Here we have N% results since we have one for each 
value of n and m. Consider the input U ai [n'}. If n' € 0(5), then the output 
from P will be U as [n'\ V bi [rii]. Hence only results having n — n' will fire. In 
fact, following the state V[n'} through P we see that we must also have m = n\ 
for the result to fire. If n' £ O(S) then, examining the effects of the permuta- 
tion transformations, we see that only the effect that has n = n\ and m = n' 
will fire. Hence we have a set of results that distinguishes the maximal set of 
distinguishable states. Theoretically, we still have outcomes with n = n\ and 
m G O(S) which do not happen for any of the inputs J7 ai [n]. We can bundle 
all such outcomes together and add them to the outcomes associated with, say, 
J7 ai [ni]. We now have a maximal measurement that covers all outcomes which 
distinguishes the maximal set of preparations we started with. By T I151 this 
maximal measurement is equivalent to {U ai [n] : n = 1 to N a } with respect to 
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which we have defined S. We will now show that the transformation 



a T 

p 

X\ 

b 

p 

a |vg 

(241) 

is a filter with respect to 5. Assume we send in a system with preparation A ai . 
If we do not see n = n\ at the U a3 [ni] result after P then the transformation 
will have failed to happen. We see that the transformation is certain to happen 
if A ai GS (so we will certainly get n = n\ at the afore mentioned effect). By 
T ll4l we know that the only state which can be identified by the maximal result 
U a3 [ni] is the pure state f/ a3 [ni]. It follows from T I26I that the state after P is 
U a3 [ni]B bi where B hi is some state for system b. This is the same as the state 
going into P. Hence, since P is the inverse of P, the state coming out of P must 
be the same as the state going into P. Hence, the the state that finally emerges 
is A ar . This proves the first property required of a filter (that is passes states 
in S). Since the set of effects in (|240j) constitute a maximal measurement it 
follows that A ai g S then we will definitely not see n = n\ at the effect after P. 
By causality (T I18I) this is true if if we send the system into the transformation 
in (|24ip . Hence, the incoming system will be blocked by this transformation. 
This proves that this transformation is a filter with respect to S. Since S can be 
any informational subset (defined with respect to any maximal measurement) 
we have constructed an arbitrary filter. 

It is worth noting that we have not proven that all filters for some given S 
will process states that are not in either S or S in the same way. In fact, in both 
classical probability theory and quantum theory they do, and so it will follow 
from the postulates that all filters for a given S are equivalent. This will only 
be apparent once the reconstruction is complete. For the time being, however, 
we will see that a certain class of filters (special filters) have the property that 
they correspond to projective maps onto the subspace spanned by S and so do 
process states not in S or S in the same way. 
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10.6 Special filters 

We define 

A special filter, F" 2 , is a filter having the property that it belongs 
to a set of transformations, F{£[ti], corresponding to the same setup 
having disjoint outcome sets labeled by n such that 



where {U ai [n] : n — 1 to N 3 } is the maximal measurement with 
respect to which S for the filter is defined and T a2 is the deterministic 
result. 

It follows from U18l and T ll6l that the deterministic result, T a2 , can be a maximal 
measurement where we course grain over outcomes to have only one outcome 
set. This means that a special filter followed by the maximal measurement 
with respect to which the filter is defined is equivalent to the given maximal 
measurement (by U15p . 

We have the following theorem 

T34 Arbitrary special filters exist. 

This means that we can construct such a special filter for any informational 
subset defined with respect to any maximal measurement. In fact we have 
already proven this. It is clear by inspection that the filter in (|241l) (see also 
(|240p ) is an arbitrary special filter. 
We now prove 

T35 Any preparation followed by a special filter, F?, 2 , effects a prepa- 
ration in the informational subset, S, associated with the filter. That 
is 



for special filters. 

Consider the set of results {F?, 2 [n]U a2 [m]}. These results (with appropriate 
coursegraining as explained above) constitute a maximal measurement for the 
preparations {U a [n] : n = 1 to N 3 }. By T I15I this maximal measurement is 
equivalent to any other for this set of preparations. The outcomes in {nim : 
m G O(S)} on the results {F 32 [m]U a2 [m]} do not happen for any of the prepa- 
rations in {U a [n] : n = 1 to N a }. Hence, by T I161 they cannot happen for any 
incoming state. This proves T I35I 

We can use T I35l to prove the following 

T36 Fiducial results for a system defined with respect to a special 
filter F?, 2 can all be of the form 




Fll for some particular m € O(S) (242) 
U ai [n] for neO{S) (243) 
U a >] for neO(S) (244) 



A ai F 32 e S 



(245) 




(246) 
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This follows since the system, having emerged out of a special filter will, 
by T I35I and the definition of a filter, pass through F^ unchanged. Hence, all 
fiducial results can be of the given form. 
We will prove the following 

T37 Any special filter, F, is represented by a projective map into the 
subspace spanned by states in the associated informational subset 
S. 

Note that this theorem plays no role in the reconstruction though is mentioned 
in the postlude. By T 135F we have 
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A 



(247) 



Here {X^ 1 : a\ = 1 to K a } is a set of fiducial results for type a. Since this 
fiducial set is complete for the unfiltered set, it must also be complete (in fact 
over complete) for the filtered system (take this to be of type c). It follows from 
(|247[) that {Fg 3 X" 3 : 02 = 1 to K a } is an over complete set of fiducial results for 
c. By choosing a subset of K c of these that correspond to linearly independent 
effects, we can form a complete set of fiducial results, {X^ 1 : c\ = 1 to K c }, 
for c. The filter on the LHS of (I247|) can be regarded as part of the fiducial 
effect so the LHS constitutes measuring the fiducial results before filtering. The 
second filter on the RHS of (1247[) can be regarded as part of the fiducial results 
so the RHS constitutes measuring the fiducial result after filtering. Hence, it 
follows from (|247l) that we get the same probabilities for these results whether 
we measure them before filtering or after filtering. We also know by T I35l that. 
after filtering, the state is in the subspace associated with c. We can think of 
the initial state as having a component that is in the space spanned by the 
states of system c and an orthogonal component. It follows from the facts just 
established that the state after the filter is given by just the first component. 
This proves HBTl 

We can regard the system after a filter, F^, as a new type of system, c, say. 
We can write the filter as F^ 5 if we want to emphasize this fact. Next we will 
prove 

T38 A system type, c, created by applying a special filter has N c 
equal to the capacity of the filter. 

Recall that the capacity of a filter is |0(5)| where S is the informational subset 
associated with the filter. Let {W C2 [j] : j = 1 — » N c } be a maximal measure- 
ment for c. Consider the set of effects 

Fl\{n]\N C2 [j) (248) 
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This set of results constitute a measurement which can distinguish 10(5)1 + A c 
distinguishable states in the following way. The state U ai [n] for n € O(S) gives 
outcome n (which is not equal to m) at F[n]. Hence, by course graining over 
W's outcomes, we can distinguish these states. The states W ai [j] are in S and 
give rise to outcome n\ at Fg[n] and outcome j at W C2 [j]. If N c > |0(5)| then 
we have constructed a measurement which can distinguish more than N a states 
on systems of type a (which is impossible since this is the maximum number 
that can be distinguished). We clearly can distinguish |0(5)| states on c simply 
by using the measurement {U ai [n] : n = 1 to N 3 }. Hence, N c = 10(5)1. 
We can use this to show the following important result 

T39 We can create systems having arbitrary N 3 

Assume we want to create a system having a particular value of N a . By Assump 
2, there exists at least one type of system, z, having 1 < N z < oo (actually 
Assump 2 says that K z < oo but we must have N z < K z ). Hence we can create 
a composite system zz . . . z having N zz ... z > N 3 (this uses P2 which implies 
Abe = AbA c ). By T I33I and T I38I we can filter down to a system having the 
required value of N a . 

10.7 Systems with same N 3 are equivalent 

We can replace one system with another in a circuit by the following move 



The transformations Y and Z can be absorbed into the definitions of the op- 
erations from which the system of type a is outputtcd from and inputted into. 
We will call this move a system substitution. We use this for the following 
definition. 

Equivalence of system types. We say that two system types, 
a and b, are equivalent if there exists a fixed system substitution 
by which any occurrence of a system of type a in any circuit can be 
replaced by a system of type b and another fixed system substitution 
by which any occurrence of a system of type b in a circuit can be 
replaced by a system of type a such the probability for the circuit is 
unchanged by these substitutions. 

We will now prove 




(249) 
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T40 Systems a and b are equivalent if and only if N a = N^. 



In other words, system types having the same information carrying capacity are 
equivalent (this was used as an axiom in [36]). Note this holds true whether the 
systems in question are proto-systems having the given information carrying 
capacity or have been obtained by filtering. It is clear that a and b are not 
equivalent if N a ^ Nt, since the system having smaller N will not be able to 
support as many distinguishable states. Let {U ai [7i] : n = 1 to N a } be a maxi- 
mal set of distinguishable states for a system of type a with associated maximal 
measurement Let {U ai [n] : n = 1 to N a }. Let {V b2 [m] : to — 1 to N^} be a 
maximal set of distinguishable states for b with associated maximal measure- 
ment {Vb 2 [m] : to = 1 to -/Vb}. We will now show that the following are system 
substitutions which prove equivalence when N a = Nb- 



u[i; 



u[i; 



T 



V[l] 



T b 



V[l] 



V[l] 



U[l] b 



(250) 



where T is the deterministic result and where P is a reversible transformation 
effecting the permutation 

7r = (nm <-> mn) (251) 

of the preparations {U ai [n]Vb 2 [m] : nm = 11, 12, ... } (by P2 such a set consti- 
tutes a maximal set of preparations for the composite and by P4 the reversible 
transformation, P exists). The inverse transformation to P is P. If the state 
going into the a — > b substitution circuit is A ai then the state going into P is 
A ai V b2 [l}. Then it follows from Tl29lthat the state coming out of P is of the 
form U ai [l]B b2 . The state going into P will be of the same form. Hence, the 
state coming out of P will be of the form A ai V^ 2 [l] since P is the inverse of 
P. Hence, the state emerging from the substitution circuit is the same as the 
state that was sent in. This proves we can replace a with b for circuits that are 
partioned into two pieces by the system a. However, in general, it could be the 
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case that the circuit is of the form 




(252) 



(where c may denote a composite system and the preparation A and the result B 
may be comprised of many operations). However, the statistics of such a circuit 
can, by P3, be determined by the statistics of circuits of the form 



D 



(253) 



If circuits of the form (|253l) have the same probabilities under the substitution, 
then it follows from P3 that circuits of the form (|252[) will also. Hence, it is 
sufficient to consider only circuits of this form. In this circuit, we can regard 
A aiC2 D C2 as a preparation of a system of type a up on which we perform effect 
C ai . For such circuits we have shown that we can substitute a by b. It is clear, 
by a similar argument, that we can substitute b by a using the substitution 
shown on the RHS in ([230]) . This proves T l40l 

From the above argumentation we also see that there exists a linear and 
invertible transformation between states of the a system and the corresponding 
states for the b system. Hence, equivalence implies the following in general 

T41 We can find fiducial sets of results for equivalent systems with 
respect to which the set of allowed states, transformations, and ef- 
fects on a system are the same. 



10.8 Two filters 

We have already noted that if we have one filter, Fij 2 , followed by another filter, 
F^ , of the same type then the compound operation, F a 2 F^ , is also a filter and 
filters with respect to the same informational subset. What happens when we 
have two filters that are not of the same type? We will treat a special case where 
both are special filters and filter with respect to the same maximal measurement. 
We will prove 

T42 If Fg 2 is a special filter for informational subset Sf and is a 
special filter with informational subset Sq where both informational 
subsets are defined with respect to the same maximal measurement, 
then 

FUGll (254) 
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is a filter with respect to the informational subset SVnG defined with 
respect to the same maximal measurement where 



0(5, 



FnG J 



o(5 F ) n 0(5 G ) 



(255) 



Since Sfhg C Sf,Sg, it is clear that if the input state is in SVnG it will pass 
through both filters unchanged. Now we need to prove that if the input state 
is in SfnG it will be blocked. Consider the set of effects 



F[p] 



(256) 



where {U a [r] : r = 1 to N a } is the maximal measurement with respect to 
which F and G are defined. We know that a special filter followed by a maximal 
measurement actually corresponds to an equivalent maximal measurement (with 
appropriate coursegraining) . Hence, G{j 3 [g]U a3 [p], here corresponds to a maximal 
measurement. And consequently, [p]G^ [g]U a3 [p] corresponds to a maximal 
measurement. States in SfnG simply followed by the maximal measurement 
{U a [n] : n = 1 to iV a } will give rise only to outcomes having n € O(SpnG)- 
Paying attention to the definition of special filters, this means that, for such 
states, we must have either p ^ pi or q ^ qi (or both) where p\ and q\ are the 
values of p and q for which the filtering is effected. Consequently, such states 
must be blocked. This proves T I42I 



10.9 Relationship between K 3 and iV a 

We are now in a position to prove the following 

T43 The relationship between K a and N a for any system is of the 
form 

K 3 = N r a (257) 
where r = 1, 2, . . . is a constant independent of the system type. 

We will drop the subscript, a, for the moment. To prove T I43I we note that it 
follows from T I41l that if is a function of N: 

K = K{N) (258) 

We can filter any system having N + 1 distinguishable states to have just N, 
or to have just 1, such that the informational subsets associated with these two 
nitrations are nonoverlapping. Hence, 

K(N + 1)>K(N) (259) 
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From P2 we know that N ab = N a N b . Hence P3 implies 



K(N a N b ) = K(N a )K(N b ) (260) 

In number theory we would say that K (•) is a completely multiplicative function. 
Finally, we know that we have systems for which 

TV =1,2,3,... (261) 

It is proven in Appendix 1 that T I43I follows from (|258H261j) . This proof works 
by considering the prime factorisation of N. Without the condition that K is 
an increasing function of N we could have completely multiplicative functions 
in which different prime factors are raised to different powers: 

K{N) =JJ^ m *W (262) 

i 

where pi is the ith prime number and mj(iV) is the multiplicity of pi in the 
prime factorisation of N (equal to if this prime factor does not appear). 

An alternative proof that K a — N a is given in Sec. 1 10. ITl below. 

This relationship between K and N in T I43l was first suggested as a possible 
relationship by Wootters [751 HI] • It was nrs t proved that it follows from the 
above conditions in [36]. It suggests a hierarchy of classes of theories which 
we will call the Wootters hierarchy. We will see below that the first theory 
in the hiarachy, when r = 1 is classical probability theory. The next class of 
theories, when r — 2, contains quantum theory (there are other toy theories 
having r — 2 that are not consistent with the postulates in this paper [4"9l IBS] ). 
We will see that there are no theories consistent with the postulates having 
r > 2. Kirkpatrick [49] has given a simple model for theories with a finite 
number of pure states having any value of r. Zyczkowski [77] has worked on 
constructing theories having r = 4 which have a continuum of pure states. Both 
Kirkpatrick's model and Zyczkowski's construction violate one or more of the 
postulates given here. 

10.10 Classical and non-classical cases 

We note the following theorem. 

T44 We have classical probability theory if and only if K 3 = N a . 

If K 3 = N a then one set of fiducial effects is simply the maximal effects, {U a [«] : 
n = 1 to iV a }, corresponding to a given maximal measurement. Since these 
fiducial effects all belong to the same measurement, the probabilities, A ai , in a 
general state defined with respect to this fiducial set of effects must satisfy 

Y^A ai < 1 (263) 

a=l 
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The states in the maximal set of distinguishable states are represented by vec- 
tors, U ai [n), in which one probability is equal to 1 and all the others are equal 
to zero. These states are pure. We have, then, 



where p ai are the fiducial probabilities (numerically equal to A ai ). Hence a 
general state can be written as a convex combination of the pure states in this 
given maximal set of distinguishable states and the null state. This means that 
these are the only pure states. This is the defining characteristic of classical 
probability theory (that there is only one maximal set of distinguishable states, 
all these being pure). It is a simple matter to show that we get the classical 
probability simplex, the correct rules for composite systems, for transformations, 
and so on (see [40] for example). In the case that K > N there must exist at 
least N + 1 pure states. This is inconsistent with classical probability theory. 

10.11 The signature 

The results in this subsection play no role in the reconstruction and can be 
skipped in a first reading (although we do give an alternative derivation of the 
relationship, K 3 — N£). 

One particularly illuminating way of viewing the relationship between K 3 
and N 3 is in terms of what we will call the signature. The signature tells 
us something about possible choices for the fiducial set of results. We can 
construct a fiducial set of results by applying various (special) filters all defined 
with respect to a given maximal measurement {U a [n] : n = 1 to N 3 }. Let 
Fg 2 {n, n', n", . . .}[p] be the set of transformations associated with a special filter 
that filters with respect to informational subset S nn ' n »,,, defined with respect 
to the given maximal measurement where 0(S nn 'n" ...) = {n,n',n", . . .}. The 
filter is effected for p = p\. We will first consider all such filters where O has 
one element. There are N 3 such filters. Then we will consider the cases where 
O has two elements. There are Na ( 1 ^*~ 1 ) such filters. And so on. For each 
case we will consider the fiducial results that are formed by placing another 
special filter, F||{n, n', . . . }[p], of the same type followed by some result (we 
know from TI36I that all fiducial results can be of this type). At each stage we 
count only the additional fiducial results required beyond those that have been 
counted already. We will, of course, make use of T I42I (the theorem concerning 
overlapping filters). Now we will implement this procedure 

Step 1 Consider special filters F^{n}[p]. For systems passing through such a 
filter, we can form a fiducial set of results by letting them pass through 
another special filter, F^{n}[p], of the same type and follow this by some 
result. Let x\ be the number of fiducial results required for such a system. 
Since there are N 3 filters of this type, we have so far counted x\ N 3 fiducial 
results. In fact, we know that X\ = 1 since it follows from T I43I that 




(264) 



71=1 
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systems having N = 1 have K = 1. Further, we can actually choose these 
fiducial results to be equal to the elements, U a2 [n] of the given maximal 
measurement. 

Step 2 Consider special filters F^jn, n'}[p]. For systems passing through such 
a filter, we can form a fiducial set of results by letting them pass through 
another special filter, F^{n,n'}[p] of the same type and follow this by a 
some results. We have already counted some contributions to this set of 
fiducial effects in Step 1. Let xi be the number of additional fiducial effects 
required for each such filtration. We count x<i N '^i — contributions in this 
step (since there are jM^pli filters of this type). These contributions 
are all independent by virtue of T42I since F^{ra, n'}[pi]F^{ra, n"}[pi] is 
equivalent to a filter with O equal to the intersection of {n, n'} and {n, n"} 
and we have already counted such contributions in Step 1. 



Step 3 And so on. 



X K=N 
X K=N 2 
X K=N 3 

X K=N"- 



= (1,0,0,0,0,...) 

= (1,2,0,0,0,...) 

= (1,6,6,0,0,...) 

= (1,14,36,24,0,...) 



(268) 
(269) 
(270) 
(271) 



Adding up all these contributions to K 3 we have 

K 3 =xiN 3 +x 2 2j ^3 21 ' 65 ) 

Since K 3 is finite for finite N 3 we must have Xi = for all i > r where r is a 
finite integer. This means that K 3 is a polynomial function of N 3 of finite order: 

r 

K = J2 C n NU ( 266 ) 
n 

If we put this into K(N a N\ J ) — K(N 3 )K(Nb) and compare coefficients we see 
that we must get K = N r . This provides an alternative derivation of this 
relationship without using the number theoretic arguments of Appendix lEl 
We will call the series of integers 

x= (xi,X2,x 3 ,...) (267) 

the signature. Here are a few examples 



We get these by putting = 1, 2, 3, 4 in (|265|) . This gives some insight into the 
Wootters hierarchy. For the classical case there is nothing nontrivial happening 
beyond rank one filters. For quantum theory (K = N 2 ) there is nothing nontriv- 
ial happening beyond rank two filters. This appears to be related to the Sorkin 
hierarchy [67] (see also (TTJ [65j ED] ) . The Sorkin hierarchy concerns multi-slit 
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interference experiments. Classical interference has nothing non-trivial beyond 
one slit (two slit interference can be decomposed into one slit patterns). Quan- 
tum interference has nothing non-trivial beyond two slits (three slit interference 
can be decomposed into two and one slit patterns). 

11 Gebits 

In this section we will prove N 3 = 2 case (the generalized bit or gebit) is in 
agreement with quantum theory (in particular that states belong to the Bloch 
sphere). We do this in three basic steps. First we show that the pure states 
correspond to some subset of the points on a hypersphere. Second, we show that, 
in fact, every point on the hypersphere corresponds to a pure state. Third, we 
show that K a = 4 when N a = 2. The second and third steps involve the use 
of P5 (for the first time in reconstruction). In this third step we adapt an 
ingenious method developed by CDP involving teleportation. 

11.1 Gebits - basic properties 

It follows from T ll6l that. for a gebit, we can write the deterministic effect, T a , 
as 

T a = U a [l] + U a [T\ (272) 

for any maximal measurement {U a [n] : n — 1,1} (in this subsection it is nota- 
tionally convenient to use 1 and 1 as labels rather than 1 and 2). We saw in 1 1191 
that any reversible transformation on a system is equivalent to a deterministic 
transformation. It follows from this and the fact that the deterministic effect 
is unique fT !18|) that the deterministic effect is is unchanged when preceded by 
any reversible (and therefore deterministic) transformation 

K\T ai =T a2 (273) 

(this is actually true for any deterministic transformation). If a maximal effect, 
U a [n] , is preceded by a reversible transformation we have another maximal effect 

KM=K 2 1 U ai [n] (274) 

since this set of effects distinguish the states R® 2 U ai [n] where R is the inverse 
transformation to R. 

By T I311 we know that there exists a reversible transformation, R^, which 
takes any pure state to any other pure state. If we apply two such transfor- 
mations we get a third. The reversible transformations must form a group, Q, 
whose elements can be represented by matrices. It follows from Assump 3 
that this matrix group is compact (see also Appendix [C]) . Any compact matrix 
group admits an orthogonal representation [TU] . If we go to the orthogonal rep- 
resentation then transformations will not change the length of the vectors (in 
this orthogonal representation) representing the state and hence all pure states 
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must lie on (the surface of) a hypersphere. However, we do not know that all 
points on this hypersphere will have states at them. 

In the orthogonal representation we will denote vectors by bold font lower 
case letters and indicate whether we have an effect or state by a lower or upper 
subscript for the system type. We will also include a constant factor, A^, for 
later convenience. Thus, in orthogonal representation, the pure state U a [n] be- 
comes u a [n]/\/2, the maximal effect V a [n] becomes v a [n]/\/2, and the identity 
effect T ai becomes t a /V2- We will keep upper case letters to represent trans- 
formations and drop the indices. The reversible transformation R^ becomes R 
(we could write it as i? a but this is unnecessary) and its inverse will be R T where 
T denotes transpose (this is true because we are in an orthogonal representation 
so R T R = 1). By T I311 a general pure state, u a , is equal to i?u a [l] for some 
particular pure state u a [l]. This pure state is identified by the maximal effect 
u a = i?u a [l] since 

Prob(U ai U ai ) = ^u a -u a (275) 

= i(i?u a [l])-i?u a [l] (276) 

= -Ua[l] • R T Ru 3 [l] = 1 (277) 

Note here that we need to include a factor of \ (this comes from the factors 
introduced above) when calculating probabilities. The maximal effect identi- 
fying a given pure state is, by PI, unique. Hence, all maximal effects can be 
written as u a = i?u a [l]. We know that the deterministic effect is unchanged by 
the action of Q. 

Rt 3 = t a := t (278) 
Hence we can expand a general maximal effect, v a , 



v s + v a (279) 



and a general pure state, u a , as 



where 



u s + u a (280) 

(281) 



t 

s = - — - 
|t| 

and v a and u a are orthogonal to s. Since s is invarient under Q, v$ and uq are 
constants. It follows from this fact and the fact that we have an orthogonal 
group that the lengths, |v a | and |u a are constant for these maximal effects and 
pure states. We choose vq = 1 for maximal effects. We are free to make this 
choice since we can absorb any factor into uq (it is vqUq that appears in the 
equation for the probability). This implies that t = 2s (so |t| = 2) because 
t = v a [l] + v a [l]. The latter also implies 

v a [l]+v a [l] = (282) 
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Hence, 

T45 For a gebit, the maximal effects corresponding to a given maxi- 
mal measurement are represented by antipodal points on the hyper- 
sphere. 

We will prove a similar result for states in a maximal distinguishable set later 
(this is a more difficult thing to prove). 

Now, Prob(T ai U ai = |t a • u a = 1 and hence u = 1. This gives 

Prob(V ai U ai ) = iv a -u a = i(l+v a -u a ) (283) 

We can make v a of unit length and then absorb the overall constant into u a . 
Let |u a | = u. 

Let us summarize these results. The maximal effects correspond to points 
on a unit (K a — 2)-sphere (this is a sphere embedded in a K a — 1 dimensional 
space). The pure states correspond to points on a (K a — 2)-sphere of radius 
u. We will see later that u = 1. We do not know at this stage that all points 
on these hyperspheres correspond to allowed maximal effects and allowed pure 
states (we will prove this later). 

The full set of states is in the convex hull of the pure states and the null 
state. This means that they all live in (or on) a hyper-cone of length 1 (since 
uq = 1) with the above hypersphcre (of radius u) at the base. We do not know, 
at this stage, that all points on or in the cone actually correspond to states. We 
can represent a general state corresponding to a preparation, B a , as 

b a = b s + b a (284) 

so that b = 1 for normalized states. 

The information in b a is contained in the vector 

(b ,b 1 ,b 2 ,...,b K ,-i) (285) 

where bi are the components of b a (numbered from 1 to K a — 1) in some basis. 
Since this is a cone of length 1 having a hypersphere of radius u as base, we 
have 

K a -1 

]T bf < u 2 b (286) 
l=i 

for vectors on or in the cone. 

The general effect C a can be written 

c a = c s + c a (287) 

The maximal effects have Co = 1. We have 

Prob(B ai C ai ) = ib ai -c ai (288) 

for a general effect and a general preparation. 



Ill 



11.2 Going over to the non-classical case 

We saw in T l43l that K a = iVJ with r = 1, 2, 3 . . . . If r = 1 then, for a gebit, 
we have K a = 2. Hence, the hypersphere is dimensional (embedded in a one 
dimensional space). In other words, the pure states consist of two points, one 
pointing in the positive b\ direction, and the other pointing in the negative b\ 
direction. The full set of states are convex combinations of these pure states 
and the null state (so the cone is a triangle). This is the classical case where 
the gebit is simply a bit. We have already shown that the K a = N a case leads 
to classical probability theory. Hence forth, we will assume that we are in the 
case K a =t N a . We can force this to be the case by using P4 (rather than 
P4'). For the classical gebit there is no compound permutation transformation. 
Importantly, this is the only point at which we use P4 rather than P4' in 
the reconstruction. We could force non-classicality with any such additional 
assumption that was inconsistent with classical probability theory. Since we are 
intent on reconstructing quantum theory, any such additional assumption must, 
of course, be consistent with quantum theory. 

11.3 All points on hypersphere are populated 

In this subsection we will prove that all points on the hypersphere correspond 
to (pure) states. To do this we will use P5 (for the first time in the paper) and 
employ considerations involving a getrit (that is a system having N a = 3). The 
basic idea of the proof entails sending a non-flat set of states states associated 
with a gebit informational subset of the getrit through a filter associated with 
a different gebit informational subset. The net result is that the states move 
closer to a pole while remaining on the surface of the hypersphere (after being 
normalized) . If this is repeated we can get the non-flat set of states as close to 
a pole as we like. As they are non-flat, they span the surface of the hypersphere 
near this pole. Given Assump 3 we know that the set must be closed and 
consequently there must exist an infinitesimal patch of pure states around the 
pole. By transitivity T I31I we can move this patch around to any other place 
there is a pure state. Since there must be an infinitesimal patch around any 
such point the whole surface must be covered in pure states. We will now fill 
out this argument in detail. 
First we note 

T46 Filters are non-mixing. 

This is a trivial consequence of P5 and T I21I 
Using special filters, we can prove 

T47 For a non-classical getrit (that is a system having K a > N a and 
-/V a = 3) we can have two distinct maximal measurements having 
one (and only one) element in common. 

Let one maximal measurement be {U ai [l], U ai [2], U ai [3]}. We will show we can 
construct another, {U ai [l], U ai [2'], U ai [3']}, which has one element in common 
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with the first. Note we are employing slightly different notation from before - 
the prime in U ai [2'] indicates that this is a different effect from U ai [2] (previously 
we would have denoted this by a different letter, e.g. V ai [2]). Consider a special 
filter, F{|*[n], associated with information subset S23 having 0(^23) = {2, 3}. On 
passing through this filter, the system becomes a gebit. If we follow this filter by 
a {U ai [l], U ai [2], U ai [3]} measurement then we effect the maximal measurement 
{U ai [2], U ai [3]} on the gebit. For this gebit, which is non-classical, = 2 and 
r > 2 (in the nonclassical case) we have > 4 (using T I43p . Hence there must 
be at least four pure states. Since every pure state belongs so some maximal 
distinguishable set of states (by U12I there must be at least two maximal sets 
of distinguishable states. Consequently, there must be at least two distinct 
maximal measurements. Thus, in addition to {U ai [2], U ai [3]}, there must be 
at least one more. Let this be {U ai [2'], U ai [3']}. Since maximal measurements 
correspond to antipodal points fT I45[) the four effects making up these two 
maximal measurements must be distinct. If we follow the special filter defined 
above by the maximal measurement, {U ai [2'], U ai [3']}, then, by the properties 
of the special filter, we effect the maximal measurement {U ai [l], U ai [2'], U ai [3']} 
on the original getrit. This has only one effect in common with the maximal 
measurement we started with. This proves U47I 
Next we will prove 

T48 If we have two maximal measurements {U a [l], U a [2], U a [3]} and 
{U a [l], U a [2'], U a [3']} for a getrit having one, and only one, effect in 
common then we have 

Prob(A ai U ai [3]) = a/3 for any A ai G S 12 > (289) 

where 

a = 1 -Prob(A ai U ai [l]) and /3 = 1 - Prob(U ai [2']U ai [2]) (290) 

andO(Si2<) ={1,2'} 

To prove this, consider a special filter, G a j[n], defined with respect to the max- 
imal measurement {U ai [l], U ai [2], U ai [3]} having informational subset 6*23 with 
0(^23) = {2,3}. It follows from the properties of special filters that this is also 
a special filter with respect to the maximal measurement {U ai [1], U ai [2'], U ai [3']} 
with informational subset Syy where 0(S2'3') = {2', 3'}. Consider a pure state 
A ai in Si2>- This state has 

Prob(A ai U ai [l]) = 1 -a and Prob(A ai U ai [3']) = (291) 

The first property follows from the definition of a above. The second by the fact 
that the state is in S12' • From PI and the first property above it follows that 
if A ai ^ U ai [l] then A ai G a a \ is not the null state null (it will not be blocked 
by the filter). Since filters are non-mixing, by T I461 the state A ai G^ must be 
proportional to a pure state. 

A ai G a a \= aV a2 (292) 
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That the constant of proportionality is equal to a follows from the fact that the 
probability of the state being absorbed by the filter is equal to Prob(A ai U ai [1]). 
Hence, the probability that it is not absorbed is a (and this must be equal to 
the normalization). The state, V a2 , must be in S^y using T I35I and the fact 
that is associated with information subset Sw as shown above. From the 
second property in (|291[) and the property of special filters that 

U ai [3'] = G£U a2 [3'] (293) 

we have 

Prob(V ai U ai [3']) = (294) 

Hence, 

Prob(V ai U ai [2']) = 1 (295) 

as these two probabilities must add to one for a normalized gebit state in S2'3' • 
It follows from PI that V ai = U ai [2']. Hence 

A ai G a a \ = aU a2 [2'} (296) 

Now U ai [2']U ai [2] = 1 - /3 and consequently U ai [2'] U ai [3] = /3 (as these two 
probabilities must add to 1). Hence, 

Prob(A ai G^U ai [3]) = a(3 (297) 

Using the special filter property 

U ai [3]^G a JU a2 [3] (298) 

we have 

Prob(A ai U ai [3]) = Prob(A ai G a JU a2 [3]) = a/3 (299) 

This proves UlSl 

We will now prove our main result of this subsection. 

T49 The hypersphere. The states of a gebit are given by the 
convex hull of the full set of points on a (K 3 — 2)-sphere with the 
null state. 

For the classical case this has already been shown in Sec. lll.2l Now consider the 
non-classical case. Consider the gebit corresponding to the informational subset 
Si2< where 0(S\2') = {1 5 2'} (these are states which do not give rise to U ai [3']). 
There must be linearly independent pure states, {A ai [k] : k = 1 toK^] in 
S*i2' . These constitute a non-flat set of states. Let the first two of these states be 
U ai [1] and U ai [2'] (these are the states which are identified by U ai [1] and U ai [2'] 
respectively). We will call the other states the "in-between" states. Now send 
this set of Kb states through a special filter F a J [n] with O(S) = {1,2}. We denote 
the new states by B ai [k] (by T I35l these states belong to the informational subset 
S12 where 0(S%2) = {1, 2}). By P5 and T I46l this set of states remains non-flat 
and pure (up to normalization). The C/" 1 [1] state will pass through the filter 
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unchanged (as it is in S^)- As we will see, the B ai [2] state will be parallel to 
U ai [2] while the in-between states will get closer to U ai [1]. On passing through 
F a J we can measure {U ai [l], U ai [2]}. By the properties of the special filter, 

F£U a2 [l] = U ai [l] (300) 

and 

U ai [2] = F£U a2 [2] (301) 
We have (using the same notation in the proof of T I47p 

U ai [2] + U ai [3] = U ai [2'] + U ai [3'] (302) 

where the + on the LHS indicates that we are course-graining over the outcomes 
of U ai [2] and U ai [3] to form a new result. Equation (I302p follows since if we add 
U ai [l] to both sides of this equation we get the deterministic effect which is 
unique (by U18p . Hence, 

Prob( A ai [k] U ai [2] )+Prob( A ai [k] U ai [3] ) = Prob( A ai [k] U ai [2'] )+Prob(A ai [k] U ai [3'] ) 

(303) 

Using 1(48] and the fact that Prob(A ai [fe]U ai [3']) = (since A ai [k] € S 12 >) we 
obtain 

Prob( A ai [k] U ai [2] ) + a/3 = Prob(A ai [k] U ai [2'] ) (304) 
Using B 32 [k] = A ai [fc]F a J and (|3UT|) 

Prob(A ai [fc]U ai [2]) = Prob(A ai [fc]F a JU a2 [2]) = Prob(B ai [fc]U ai [2]) (305) 

Putting all this together we obtain 

Prob(B ai [fc]U ai [l]) = Prob(A ai [fc]U ai [l]) (306) 
Prob(B ai [fc]U ai [2]) =Prob(A ai [fc]U ai [2'])-a/3 (307) 

The first property follows from (|300[) and the second property follow from (|304[) 
and (|305[) . Recall that A ai [k] g Si 2 < while B ai [k] € S12 so it makes sense to 
compare these probabilities. We see that while the probability for the U ai [l] 
outcome remains unchanged, the probability for the other outcome in the max- 
imal measurement (associated with U ai [2] in S12 and U ai [2'] in Si2>) necessarily 
decreases. It follows that the states B ai [k] are not normalized (except for the 
k = 1 case). However, by T I461 they are parallel to pure states (which are nor- 
malized). Let these pure states be C ai [fc]. Following through the mathematics 
of normalization, we have B ai [k] = (1 — af3)C ai [k]. Hence, 

Prob(C ai [fc]U ai [l]) = -^-Prob(A ai [fc]U ai [l]) (308) 
1 — ap 

Prob(C ai [fc]U ai [2]) = ^^(Prob(A ai [fc]U ai [2']) - a/3) (309) 
1 — ap 

It follows from PI that the states C ai [k] are not equal to U ai [1] (except for 
k = 1). Hence the smallest system that can support them is a gebit. Since the 
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set remains non-flat by P5, the states C ai [k] are linearly independent. Since 
systems having the same Nt, (Nb = 2 in this case) are equivalent (by T ]40j) . there 
must exist a set of states A ai [k, i = 2] in Si2> which bear the same relationship 
with U ai [1] as the states C ai [k] do with U ai [1] in S12 (i is the iteration number). 
We can iterate this process by sending the states A ai [k, i = 2] through the filter, 
normalizing to get a new set of states C ai [k, i = 3], then finding a set of states 
A ai [k, i = 3] in S12' which bear the same relationship with U ai [1] as the states 
C ai [k, i = 3] do with U ai [1] in S 12 . Hence, 

Prob(A ai [M + l]U ai [l]) = -— !— ;Prob(A ai [M]U ai [l]) (310) 

1 — ap 

Prob(A ai [M + l]U ai [2'D = -^(Prob(A ai [fc, Z ]U ai [2'])-a/3) (311) 

1 — ap 

Now a > for any A ai [k, i] that is distinct from U ai [1] and we have f3 > 
by PI as U ai [2] does not identify U ai [2'] (sec definitions of a and /? in pSDjl). 
By application of PI we see that, for k = 1,2, the states remain unchanged on 
iteration. For k > 2, however, the states A ai [k,i + 1] are closer to C/ ai [l] than 
the states ^4 ai [£;,«] are by a finite amount since then Prob(A ai [k, i + l]U ai [l]) is 
bigger than Prob(A ai [fc,i]U ai [l]) by a finite amount and Prob(A ai [k, i + l]U ai [2']) 
is smaller than Prob(A ai [fc, i]U ai [2']) by a finite amount. Hence, in the limit, we 
get 

Prob(A ai [M = oo]U ai [l]) = 1 (312) 
Prob(A ai [k, i = oo]U ai [2']) = (313) 

for k > 2 where i is the iteration number. By PI there is only one pure state 
having these properties, namely U ai [1] itself. Thus, in the limit, these states 
become equal to U ai [1] (except for the k = 2 case which remains unchanged). 
However, after any finite number of steps we have a set of pure states which 
are linearly independent and (except for the k = 2 case) as close to U ai [1] as 
we wish. We know by T I31I that there exists a reversible transformation from 
U ai [1] to each of these linearly independent states. Since we can get these states 
as close to U ai [1] as we wish, we represent these transformations as / + £fcJT[fc] 
(where I is the identity) and regard X[k] as generators. We proved in Appendix 
[C]that the space of states is compact (by Assump 3). This means it is closed. 
Hence, we can generate all states in an infinitesimal patch around the pure state 
U ai [1] on the surface of the hypersphere (the dimensionality of this patch being 
the same as that of the surface of the hypersphere as the states for k > 2 are 
linearly independent and there are — 2 or them). By T I311 this must be 
true for any pure state. By moving this patch infinitesimally we can cover the 
surface of the hypersphere with pure states. The full set of states is the convex 
sum of the pure states and the null state. This proves T I49I 

We will now prove a few results to conclude this subsection on gebits. 

T50 There exists bases choices in which the vector representing any 
pure state for a gebit is equal to the vector representing the maximal 
effect which identifies it. 
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Consider a maximal effect represented by a point, v a on a hypersphere of radius 
1 (here we are using the notation of Sec. There must exist a pure state 

on the state hypersphere with vector v a pointing in the same radial direction. 
Any other pure state will, by equation (I283p . have smaller probability. Hence, 
this must be the state that is identified by the given effect. It follows from 
equation (|283[) that v a ■ v a = 1 . It follows that u = 1, where u is the radius of 
the hypersphere of pure states, and that v a = v a . Further, from (|279[ I280[) and 
the fact that uq = vq = 1 we have v a • v a = 1. Every other pure state must be 
identified by some maximal effect which must, therefore, be represented by a 
vector that is equal to the vector representing the pure state in this basis. This 
proves T f50|) . 

We can now prove a useful result concerning gebit effects. 

T51 For gebits, effects that are proportional to any given maximal 
effect can only be written as a sum of effects that are also propor- 
tional to this same maximal effect. 

By T I50I and T I491 the maximal effects are cover the unit hypersphere. The 
surface of the cone subtended by this hypersphere has effects which are propor- 
tional to maximal effects. Consider one such effect, c a = /iv a where < fj, < 1. 
If this can be written as the sum of two vectors that are not proportional to 
each other then one of these vectors must lie outside the cone. Consider such a 
vector, b a . Such vectors give rise to negative probabilities and so cannot rep- 
resent states. To see that they give rise to negative probabilities, consider the 
pure state, u a , that is opposite b a (by opposite, we mean that if b a = bos + b a 
then u a = s — vh 3 for some positive v) . The maximal effect opposite this pure 
state gives probability and hence is orthogonal to it. Hence, the vector b a 
subtends an angle greater than 90° and so gives a negative probability for u a . 
This proves T f5T1 

Another result follows from this. 

T52 For a gebit, any effect that gives probability zero for a pure 
state, Z7°[2], is proportional to the maximal effect, C/ a [l], which iden- 
tifies the pure state Z7 a [l]. Here U a [l] and U a [2] form a maximal 
distinguishable set. 

As established in the proof of T I511 all effects must lie inside the cone (in the 
sense that they cannot subtend an angle with s that is greater than that sub- 
tended by any maximal effect. The angle at the base of the cone is 90° as 
established in the previous proof. Further, this cone coincides with the cone of 
states. Hence, any effect that gives zero probability for a given pure state must 
be proportional to the effect identifying the opposite pure state. This proves 
T T521 

Similarly, 

T53 For a gebit, any state that gives probability zero for a maximal 
effect, U a [2], is proportional to the pure state, f7 a [l], that is identi- 
fied by the maximal effect U a [l]. Here, {U a [l], U a [2]} is a maximal 
measurement. 
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This follows for the same geometric reasons as 1 1521 



11.4 Theorem concerning non-flattening transformations 

We will prove the following useful theorem 

T54 Any transformation formed from operations consisting only of 
pure preparations, reversible transformations, and maximal results 
is non-flattening. 

We will illustrate the proof of this with an example. Consider the transformation 




(314) 



where A and B are pure preparations, R and S are reversible transformations, 
and C and D are maximal effects. We can put this transformation in the form 




(315) 



Here we have taken all the preparations to the bottom left, all the results to 
the top left, we have pulled the open inputs to the bottom right and the open 
outputs out to the top right. We can put any transformation in this form simply 
by pulling all the preparations down to the left, all the open inputs down to the 
right, all the results up to the left, and all the open outputs up to the right. In 
the middle we will have a bunch of reversible transformations. We will now show 
that, so long as we have only pure preparations, reversible transformations, and 
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maximal results, that any such transformation is equivalent to the following 
transformation 



T 




e 




1 


F 




c 


\ 




Q 


a 


1 




u 


V 



(316) 



where U is a pure preparation, Q is a reversible transformation, F is a filter 
having capacity equal to one, and T is the deterministic result. To see this 
first we note that, by T25[ that if each of the components of a system has a 
pure preparation then the composite preparation is also pure. Hence we can 
replace all the preparations with a single pure preparation for some (generally 
composite) system of type a. We can regard all the input wires on the right as 
constituting a single system (by U23[> which we represent by a system of type 
b. Similar remarks apply to the output wires (which we represent by a system 
of type d). We know by P2 that a result on a composite system is maximal 
if it is comprised of maximal results on each of the components. Hence, we 
can represent the effect of all the maximal results on the upper left by a single 
maximal result (on a system, possibly composite, we take to be of type c). Any 
maximal result is equivalent to a special filter of capacity one followed by the 
deterministic result (this follows from the properties of special filters and the 
fact there is, according to PI, only one maximal effect identifying a given pure 
state). The special filter is chosen so that it transmits unchanged only states 
proportional to the pure state that is identified by the given maximal result. 
The system after the filter will be regarded as a system of type e (this is the 
filtered c type). Since the filter has capacity equal to one, N e = 1. Now we 
come to the bunch of reversible transformations in the middle. These can be 
regarded as a bunch of reversible transformations in parallel followed by another 
bunch of transformations in parallel and so on. The wires can be regarded as 
the identity transformation (which is a reversible transformation). We know 
from T I32I that two or more reversible transformations in parallel constitute 
a reversible transformation. Further, the sequential composition of reversible 
transformations gives rise to a reversible transformation itself. Hence, the overall 
transformation is reversible. We represent this by Q. If we send a non-flat set 
of states into the transformation for the system of type b shown in (13 1 6|) then 
it follows from PI and P2 then we have a non-fiat set of states for the system 
ab. This is because, as U ai is pure we can, by PI, find a maximal measurement 
for which it only gives rise to a single outcome, and then by P2, this maximal 
measurement on a along with the maximal measurement on b with respect to 
which the non-flat set is spanning, constitute a maximal measurement on the 
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composite with respect to which we must have a non-flat set of states. A non- 
flat set of states must remain non-flat after a reversible transformation. Hence, 
a non-flat set emerges from the transformation Q. Next the states is subject 
to a filter on system c and the identity transformation on d. The identity 
transformation can be regarded as a filter (the "do nothing" filter). We know 
from T I24l that two filters in parallel constitute a filter on the composite system. 
Hence, we have a non-flat set of states after the filter. We know that N e = 1. 
It follows from T I43I that K e = 1. This means that es = 1 only (as the label 
es must run from 1 to K e ). Hence we can write the state, E e5di after passing 
through the filter as G e5 H di where G e5=1 = constant (the only component of 
G 65 is equal to some constant). Since K e = 1 a maximal measurement will 
only have one associated maximal effect which must, then, be equal to T, the 
deterministic effect. Hence, if the set of states for ed is non-flat, then by PI and 
P2, the set of states for d, taken alone, must be non-flat (for similar reasons as 
just given for the ab case). This proves T I54I 

An immediate consequence of this theorem is the following. 

T55 Any preparation formed from operations consisting only of pure 
preparations, reversible transformations, and maximal results pre- 
pares a state proportional to a pure state. 

Any preparation of this sort can be formed by sending a system prepared by 
a pure preparation into the type of transformation considered in proving T I54I 
Since all non-flattening transformations are, by T 121[ also non-mixing this im- 
plies that the system emerging must be in a state proportional to a pure state. 

An immediate consequence of this theorem is that if we perform a maximal 
measurement on one component of a bipartite system prepared in a pure state 
(which may be entangled) then, for each outcome of this measurement, we obtain 
a pure state (up to normalization) on the other side. 

11.5 Entanglement, teleportation and entanglement swap- 
ping 

In this subsection and the following one we adopt the techniques of Chiribella, 
D'Ariano, and Perinotti to the present situation to show that K a = N^. We 
will work with gebits of type a. We will work with a particular set of distin- 
guishable states, {U ai [1], U ai [2]}, which we will think of as corresponding to 
the computational basis. Corresponding to this is the maximal measurement 
{U ai [l], U ai [2]}. We define an equatorial state to be one that is pure and has 
probability i associated with the two outcomes of the computational basis (if 
the computational states lie on the poles of the hypersphere then the equatorial 
states lie on the equator). We select one particular equatorial state, t/ ai [+], 
which we will use frequently (let U ai [—] be the opposite state where the two 
together comprise a maximal distinguishable set). The corresponding maximal 
effect is U ai [+]. We have 

Prob(U ai [l]U ai [+]) = Prob(U ai [2]U ai [+]) = i (317) 
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(and similar equations for any other equatorial state). 

A product state is one that can be written as A ai B a2 . We will say that 
any pure state that cannot be written as a product state is an entangled pure 
state. Consider two gebits. We can define the informational subset Sj U 22} with 
respect the maximal measurement, {U ai [m]U a2 [n] : mn = 11,12,21,22} where 
0(5111,22}) = {11,22}. A pure state, E aia2 , in S {11 ^2} is entangled if 

pn > and p 22 > (318) 

where 



pn -Prob(U ai [l]U a2 [l]E ai32 ) and p 22 = Prob(U ai [2]U a j2]E aia2 ) (319) 

This is equivalent to saying that E aiCl2 is not equal to cither t/" 1 [1] t/" 2 [1] or 
[7 ai [2]i7 02 [2]. These are the only product states in £{11,22} since any other 
product states for these two gebits would have some probability associated with 
the 12 and/or 21 terms. Hence, this is equivalent to saying that the state is not 
a product state. 

A state, D a±a2 , in £{11,22} will be said to be maximally entangled if it is pure 
and if 

Prob(U ai [l]U a2 [l]D aia2 ) = Prob(U ai [2]U a2 [2]D aia2 ) = 1 (320) 

Maximally entangled states will play an important role in this subsection and 
the next. We will define a canonical maximally entangled state, M" 1 " 1 below. 

One way to produce a entangled pure states is to use the permutation trans- 
formation P cno t which effects the following permutation 



T^cnot 



( 11 -> 11 \ 

12 — > 12 
21 -> 22 
\ 22 -> 21 j 



(321) 



The inverse transformation P cno t effects the same permutation (the inverse of a 
cnot is a cnot). We note that 



|| || 



R;not 




3 


and 



fen 



(322) 



(since we are working in a fixed computational basis, we can simply put "1" 
and "+" rather than U[l] and U[+] inside the boxes). The left equation follows 
virtue of the choice of permutation and the fact that, according to PI, there 
is only one maximal effect identifying f7 ai [l]. The right equation follows for 
similar reasons. 
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It follows from Q322I) . 1251 and the fact that P cn ot is reversible, that 



A 1 



(323) 



produces an entangled pure state as long as A ai is not equal to either of the com- 
putational states. If A ai is equatorial then we produce a maximally entangled 
state. One example of this is the following preparation 



■ L1IUL 



(324) 



We will take this to be our canonical maximally entangled state which we will 
denote M ai<l2 : 



■= P, 



(325) 



This entangled state is identified by the maximal result 



00 



M : = P CJ 



(326) 



as P cn ot is the inverse of P cn ot- That this is a maximal result is clear since the 
measurement 



0| B0 SJ B| 
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R:not 



distinguishes the four states 
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R:not 
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(327) 



(328) 
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Hence, the measurement is maximal and M is a maximal result (which we will 
take to be canonical). 

We are now in a position to prove the following theorem 

T56 Consider the gebit preparation 





M 








1 


E 



where E is an entangled pure state in £{11,22} and M is the canonical 
maximal result defined in (|326p above. We can use this preparation 
to prepare a state proportional to any pure state by making an 
appropriate choice of preparation B. 

It follows from 1 1541 that, if we send a non-flat set of states in for B ai , then we 
must get a non-flat set of states out (call the corresponding output states A as . 
Further, it follows from T I21l that. if B ai is pure, then A a3 must be proportional 
to a pure state. It follows from the fact that E b2C3 is entangled that the smallest 
system that can support the output states is a gebit. To see this we will note 
that 




(330) 



where the | in each equation indicates that the effects corresponding to these 
results are proportional with constant of proportionality equal to 4. To prove 
the equation on the left consider sending in U ai [1] and then U ai [2] . It follows 
from the definition of the M effect in (|326|) that when we send in U ai [1] the 
probability is \ and when we send in U ai [2] the probability is 0. The equation 
on the left then follows from T 152I The equation on the right follows by similar 
reasoning. We can also prove 




(331) 



where p\\ and P22 are defined in (|319[) . To prove the equation on the left, we 
note that if put the result U a2 [2] on the output of the LHS of this equation, 
then we must get zero since E a2Cl3 is in Siuj&y. Hence the equation follows 
from T I53I (and the constant of proportionallity, pn, follows from considering 
the definition in (|319l0 . The equation on the right follows by similar reasoning. 
It follows from ([5501 fHSTj) that if we put B ai = U ai [1] in we get ^-U a3 [l] 

out, and if we put B ai = U ai [1] in we get 2f-U a3 [2] out. Since the state E a2<13 is 
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entangled we have, by Q319I) . that pu and P22 are non-zero. Hence the smallest 
system that can support the preparation in (j329[) for all pure inputs, B ai , is 
a gebit. We established that these output states must be non-flat. The input 
states correspond to the full set of pure states on a hypersphere. The output set 
of states correspond to states that are proportional to pure states. Hence have 
a linear transformation on a hypersphere of input states, B ai , to a set of output 
states, A as , having the same dimension. Under such a linear transformation, 
a hypersphere can only transform to a hyper-elipsoid of the same dimension. 
Hence, there exists an input state B ai giving rise to any point on this output 
hyper-elipsoid. These points on the hyper-elipsoid correspond to states that are 
proportional to pure states. For every point on the output hyper-elipsoid there 
must be a corresponding point on the input hypersphere. This proves T I561 
We now prove the following theorem 

T57 The preparation 



is is 











fcnot 




Fcnot 




(332) 



prepares a state proportional to a maximally entangled state in 
S{ii.22} where the constant of proportinality is ^. 

To see this note first that the set of results 
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(333) 



constitute a maximal measurement on the input gebits in the computational 
basis (11, 12, 21, 22). This is clear because each of the input states U ai [i]U a2 [j] 
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into this measurement must give rise to a different outcome for mnrs by virtue 
of the fact that we have permutations in the computational basis. Now consider 
the transformation 



Is is 









fcnot 




R;not 




(334) 



By following through the effect of the permutations (recall that 7r cnot is equal 
to its own inverse) we see that 



Prob 



00 



1 if ij = mn = 11 
1 if ij = mn = 22 
else 



(335) 



This probability is zero whenever m is not equal to n. Since (|333p constitute 
the effects of a maximal measurement, it follows from T I16I that we must have 
probability zero for m and n to be different whatever state we send into trans- 
formation R. And hence it follows that this transformation R outputs states in 
£{11.22} ■ This means we prepare a gebit (as |0(S{ii,22}| = 2). It follows from 
PI and (|555|> that 
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and 
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(336) 



Hence, 
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(337) 



yuuy 

We know by U55l that the state prepared by <\?>2>2\ is proportional to a pure state. 
Hence it follows from (|337l) that the state prepared in (|332[) is proportional to 
a maximally entangled state with constant of proportinality \. 
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Next we will prove the following: 



T58 Entanglement swapping. There exists a choice of pure state 
B ai such that the following holds 




(338) 



Further, B ai , must be equatorial. 



The preparation on the LHS of (I338P must be in £{11,22}- To see this we note 
that 



M 



M 



and 



M 



M 



4 



(339) 



(340) 



This follows from Tf53l and the fact that Prob(M aia2 U ai [r7i]U a2 [n]) = \S mn as 
M" 1 " 2 is a maximally entangled state in £{11^2}- Hence, regardless of what we 
choose for the state B a , we must have 



Prob 




Prob 




(341) 



by virtue of the choice of permutation associated with P cn ot- This proves that 
the preparation on the LHS of (j338[) is in Si 1x 22}- It follows from H55I that 
this state is proportional to a pure state. Given that the state on the LHS of 
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(|338[) is pure and in 5(11,22} , it follows from 1 1561 T I571 and the definition of the 
canonical maximally entangled state, M aib2 , in (1325)) that there exists a state 
B a such that 




(342) 



where 7 is to be determined. In obtaining (|342[) . the preparation in T I57I plays 
the role of E aia2 in T I561 These circuit diagrams are interpreted graphically. We 
can rewrite the above equation as 




(343) 



We have already established that the preparation up to the dotted line is a pure 
state in S{n 22} ■ The maximal results, 
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(344) 



form a maximal measurement for the gebit consisting of states in S/1^22} a s 
they distinguish the 11 and 22 states. It follows from 1 1181 that, for gebits in 
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${ii,22} inputted into the transformation 







(345) 



then we always get n = 1. Hence, it follows that the maximal results 
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(346) 



form a maximal measurement in 5{n ; 22}- If we were to place the maximal effect 
on the right after the dotted line in (|343[l it follows from this equation that we 
would get probability zero (as U ai [+]U ai [— ] — 0). Consequently (I338j) follows 
from (|343[) and T I53I though we have yet to demonstrate that the constant of 
proportionality is |. It is easy to see that 
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\ if m = n 
if to 7^ n 



From this and T I51I it follows that 
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Using p39|340|348p and the properties of P cn ot we obtain 
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Using (I339I340P we obtain 
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(350) 
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Since we have established that the LHS of (|349l) is proportional to the LHS of 
(|350p , the same must be true for the right hand sides of these equations. This 
tells us that B a is equatorial (since get same B ai U ai [n] for n — 1 and n = 2) 
and that the constant of proportionality is |. This proves T I581 

The | above can be thought of as \ times \. The \ is the standard success 
probability for teleportation (entanglement swapping can be thought of as an 
application of teleportation). The \ comes from the following result. 

T59 If the state B a is equatorial we have 
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for all H ai32 . 
To prove this consider 
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(352) 



where T is the deterministic effect and C is an arbitrary preparation. By T I181 
we can write T ai — U ai [l] + U ai [2}. Hence the probability on the right in the 
above equation is equal to 
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(353) 



where the \ follows from the properties of the permutation transformation (see 
comments below (|344[) ) and the fact that B a is equatorial. Hence, 1 1591 follows. 
We will now show that 
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T60 Probabilistic teleportation. The following holds 




(354) 



for any state A ai where B ai is the equatorial state in T I58I 

We prove this important theorem following the technique of Chiribella, D'Ariano, 
and Perinotti. First, we note from T I56l that there exists a state D ai such that 
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where /i is a constant of proportionality (0 < fx < 1). Hence, 
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(356) 



where we have used (|355l) in the first step, T I58I in the second step, and Q355P 
again in the final step. This proves 1 1601 



11.6 Proving that K = N 2 

We have already proven that K — N r where r = 1, 2, . . . (in T I43p . We will 
now prove that, in the non-classical case (where r > 1) we must have r = 2. We 
note 

T61 If K 3 < 4 for a gebit then, in the non-classical case, K = TV 2 
for all systems. 

This follows immediately since r = 2 is the only non-classical case consistent 
with K = N r if K < 4 when N = 2. 

We will prove that X < 4 for a gebit by using the ingenious techniques devel- 
oped by Chiribella, D'Ariano, and Perinotti (CDP). In particular, see Lemma 22 
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of [H] and Sec. IX. C of [13] ■ In the previous subsection we have laid the ground- 
work for this proof by proving the entanglement swapping result in T I58I and 
the probabilistic teleportation result T I60I CDP proved similar results (though 
using quite different techniques since they have different postulates). There are 
a few differences in setting up the theorem below here compared with CDP 
because we start with a different set of postulates: first, we are working gebits 
rather than general systems (the result T I61I bridges this gap in our case); and 
second, are carrying around an extra factor of ^ whose origin was explained in 
T I59I We will now prove the following important result. 

T62 State space dimension. In the nonclassical case we have 
K a = Nl 

It follows from TlMland P3 that 



B P, 



(357) 



M 



We use the = symbol rather than = (see Sec. 15. ip because it is possible on the 
LHS, but not on the RHS, to feed the output into the input. To prove (|357|) 
from T I60l we need to invoke P5. We could send one component of a composite 
system into input on the LHS (or RHS) of Q357p . In this case, P5 ensures 
that product effects are sufficient to characterize transformations. If we have 
a product effect then we, effectively, are reduced back to the situation in T I60I 
From T I59I we have 



Prob 




(358) 



The circuit in (|358[) corresponds to taking the output on the LHS of (|357[) and 
feeding it into the input. We can do this because the causal structure of the 
fragment on the LHS of (|357[) allows it. However, we cannot make this happen 
on the RHS of (|357p since the causal structure does not allow it. Nevertheless, 
we can make this happen mathematically and it corresponds to taking the trace. 
We will show how to do this. For convenience we put 



M I] 



:= B P 



(359) 
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Then, invoking TQT] and T lTOl (HJSTf gives 



N aia2 M a ^ = lz£ (360) 



where is the identity. Hence, 



since the trace of the identity is equal to the dimension of the space on which 
it acts. But Q358|) gives us 

N aia2 M a ^ < X - (362) 
It follows that, for a gebit, K 3 < 4. Hence, by H6TI 1(62] follows. 



11.7 The Bloch sphere 

We now see immediately that a gebit is, in fact, a qubit. 

T63 The Bloch sphere. The pure states, U ai for a gebit corre- 
spond to the points, u, on a unit 2-sphere and, likewise, the maximal 
effects, V a , for a gebit correspond to the points, v, on a unit 2-sphere 
such that 

Prob(U ai V ai ) = i(l + u- v) (363) 

Antipodal points correspond to distinguishable states in the case of 
states, and to a maximal measurement in the case of effects. 

This follows immediately from the results established in Sec. 111.11 U49[ and 
T I62I This is the Bloch sphere associated with the qubit of quantum theory. 



12 Quantum theory reconstructed 

In this section we will complete the reconstruction of quantum theory. To do 
this we will use the machinery of the duotensor framework provided in Part Mil 
We will recover the following two mathematical axioms for quantum theory. 

Axiom 1 Operations correspond to operators. 

Axiom 2 Every complete set of physical operators corresponds to a complete 
set of operations. 

The operators here are understood to act on a complex Hilbert space. These 
axioms were explained in Sec. 18.41 
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12.1 Proving Axiom 1 

First, we simply note that the space of Hermitian operators on an N dimensional 
complex Hilbert space is of dimension TV 2 . This follows from the fact that 
we have iV 2 real parameters for such Hermitian operators. Motivated by this 
we choose a set of positive (and therefore Hermitian) operators, {-X"™ 1 : ai — 
1 toiV 2 }, that span the space of Hermitian operators acting on a iV a dimensional 
Hilbert space, Hj\f, ■ These operators will be associated with the fiducial effects, 
X? 1 . We choose another set, { ai X ai : ai = 1 to N%}, which will be associated 
with the fiducial preparations, ai X ai . Recall that it follows from T[T]that the 
operation 

A d^...fe _ d*es-fe Aaib2 ^ y^yg . . . ^ • • • f ^ (364) 

corresponds to the operator 

id 4 e 5 ...f 6 _ d 4 e 5 ...f 6 a yai yb 2 . . . Y c 3 , y^i Y^5 . . . . yh ClR^\ 

/i aib 2 ...c 3 — ■^■a 1 b 2 ...c 3 A 3l A b 2 ^c 3 AaA- fa A (OODJ 

if we have 

ai X^X^ =Prob( Ql X ai X a Q i ; ) (366) 

(i.e. equal hopping metrics for operations and operators). By definition, when 
we have such a correspondence then the probability is equal to the corresponding 
operator circuit. For example, 

Prob(A aib2 B c b f C aiC3a4 ) = A a ^B c b l 3 <C aiC33i (367) 

Hence, if we can satisfy (|366l) then we will have proven that operations corre- 
spond to operators. This is Axiom 1 of quantum theory as given in Sec. 18.41 

We will show that we can satisfy condition (|366p for a particular choice 
of fiducials. First we note that a gebit is associated with each informational 
subset S{ m ^ n j (having 0(Ss mtn y) — {m,n}) defined with respect to maximal 
measurement {U ai [n] : n — 1 to N a }. The pure states in a gebit lie on the surface 
of a sphere (by T I63|1 . We will place an axis system in this sphere such that 
the pure states U ai [m] and U ai [n] correspond to vectors pointing in the + and 
— directions along the z-axis. Let U ai [mra+] and U ai [mra-] be preparations 
corresponding to pure states pointing along the + and — directions along the 
x-axis where U ai [mnx+] and U ai [mra-] are the corresponding maximal results. 
Sometimes we drop the "+" and simply write U ai [77miz;] and U ai [mnx]. We use 
similar notation for the y-axis. We will prove 

T64 There exists a maximal measurement 

{U ai [n] : all n ^ mi,ni} U {U ai [minia;+], U ai [minia;— ]} (368) 

(where mi ^ rii) identifying the maximal distinguishuable set of 
preparations 

{U 3l [n] : all n J= m l ,n l } U {\) 3l [m 1 n 1 x+], U ai [mimx-]} (369) 
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where this maximal measurement can be implemented by placing the 
maximal measurement, {U ai [minix+], U ai [minix— ]}, on the gebit 
associated with Si mini \ after the special filter F associated with the 
maximal measurement 

{U ai [n] : all n} (370) 

having informational subset St mitni \ with 0(Si miini \) — {mi,rii}. 
Similar results hold if x is replaced by y. 

Consider the special filter followed by a maximal measurement on a gebit as 
described above. We note that the preparations U ai [rt] (n ^ mini) in (|369[) 
are identified by the special filter while the states corresponding to U ai [mnx±] 
pass unchanged through the filter and are then identified by the maximal mea- 
surement on the gebit. Hence the states Q369I) can be distinghished and the 
measurement (|368[) is maximal. 

Now consider a Hilbert space, W a , spanned by a basis {\n) a : n — 1 to N a }. 
Define 

\mnx) 3 := -^=(|m) a + |n) a ) and \mny) 3 := -^=(|m) a + i\n) a ) (371) 

Now we can define some effect operators (actually these are rank one projectors) 
in an obvious notation 

U a [n] := |n) a (n|, U a [mnx] := \mnx) a (mnx\, and U a [mny] := \mny) a (mny\ 

(372) 

The Hilbert space T-L a is spanned by a basis {\n) a : n = 1 to N a } we can define 
a corresponding set of preparation operators 

[7 a [n] := |n) a (n|, U a [mnx] := \mnx) a {mnx\, and U a [mny] :— \mny) a (mny\ 

(373) 

in an obvious notation. Here we are using a rather than ai as a label. We will 
sometimes include the integers and sometimes omit them as is convenient. We 
define the set 

J- a = {n : n = 1 to N a } U {mnx, mny : all m, n = 1 to N a with m < n} (374) 
We note that \T a \ = iV a 2 . With these definitions we can show 
T65 If we choose fiducial sets of preparations and results 

{ Ql X ai = U ai [ fll ] : ai e T a } and {X^ 1 = U a >i] : a x € T a } (375) 
and we chose fiducial sets of operators 

{ ai l ai = L> ai [ ai ] : ai G Ja} and {X^ = U ai [ai] ■ m G T a } (376) 
then 

Tr( ai X ai ^) = Prob( ai X ai X a 7) := ai g< (377) 

and, further, the matrix ai g ai is invertible (which means that the 
fiducial states form a spanning linearly independent set as do the 
fiducial effects). 
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Consider the case N a = 3. We can show that 
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1 h h 
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Prob( Bl X ai X£) := ai 5 a; 
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(378) 



where h 
to 



h and q 



Here we have ordered the rows and columns according 



1, 2, 3, 12a;, 12y, 13x, 13y, 23x, 23y 



(379) 



The l's down the diagonal of (I378[) follow since each fiducial effect identifies 
the corresponding fiducial preparation. The h in position (1, 12a;) follows from 
T63I and the fact that the U a [1] state corresponds to a vector pointing along 
the z-axis whereas the J7 a [12x] state corresponds to an effect pointing along the 
x-axis. All the other h's follow for similar reasons. The q in position (12a;, 23y) 
corresponds to 

Prob(U ai [12a;]U ai [23y]) (380) 

We can think of this as the preparation of a gebit state C/ 0l [12a;] in the infor- 
mational subspace S{x,2V Now we know that 



Prob(U ai [l]U ai [23y]) = and 



Prob(U ai [2]U ai [23y]) = - 



(381) 



The first equation follows from the fact that the maximal effect, U ai [23y], 
pertains to the <S{2.3} informational subspace and so, by T I641 we can put 
U ai [23y] = F^U ai [23y] where F is a special filter corresponding to S , { 2 .3}- The 
second equation follows from T I63I It now follows by T I52I that, for states re- 
stricted to £{1,2}, the result U ai [23y] is equivalent to |U ai [2]. Thus, using (I38ip 
we get that the probability in (|380[) is equal to q. All the q's in (|378|) follow for 
similar reasons. The 0's in (I378P follow immediately from T I64I It is a simple 

matter to see that, with the choices in (|376l) . the matrix Tr( 0l X 3l X^) is the 
same as in (|378[) . For > 3 all the entries can be deduced by similar reasoning 
and (I377P is true in general. The fiducial operators clearly form a spanning set 
(for Hermitian operators acting on T-L a ). From this it follows immediately that 
ai g ai is invertible. This proves U65I 

This result, together with T[T] proves that Axiom 1 above follows from the 
postulates. 

12.2 Operators for pure states and maximal effects 



We can now prove 
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T66 Every operator of the form 

A 3 = \ip} a {i/>\ where = 1 (382) 

corresponds to a pure preparation and every operator of the form 

B a = \<p) a (<p\ where (tp\<p) = 1 (383) 
corresponds to a maximal result. 

We will prove this by induction. First we note that this is trivially true for a 
system having N a = 1. We prove that, if this is true for a system of type b having 
a particular value of Nt, = N, then it is also true for a system of type a having 
N a — N + 1. Assume we have a maximal set of N + 1 distinguishable states for 
a, labeled by n = 1 to N+l, According to 1 1651 we can generate a fiducial set of 
operators from a basis |n) a where these N + l distinguishable states correspond 
to the operators \n) a (n\ . Consider a special filter, F, associated with the maximal 
measurement that distinguishes these states having O(S) = {n : n = 1 to N}. 
This will produce a system, b, having — N. The first N states in the 
above maximal distinguishable set for a will pass through the filter unchanged 
and constitute a maximal distinguishable set for b. Hence, we can generate a 
fiducial set of operators as in 1 1651 from \n) a (for n = 1 to N). In this case, 
|n) a (n| (for n = 1 to N) correspond to the N distinguishable states for b. For 
the \ip) a in (|382[> . we can write 

|^>a = a\tjj) a +b\N+ l) a where (N + l\tp) = 0, \a\ 2 + \b\ 2 = 1 (384) 

We can consider the operator \ip) a (ijj\. This is in the space spanned by the 
fiducial operators for b (since Tr(|$) a (Vi|iV + 1) 3 {N + 1|) = 0). Therefore by 
supposition, there exists a pure preparation corresponding to \^} a (ip\ since we 
are proceeding by induction and hence are assuming that U66I is true for b. 
Since this preparation is pure, we know by U12I that it belongs to a maximal 
set of N distinguishable preparations for the system of type b. We are free to let 
the pure preparation corresponding to l^a^l be the Nth of the distinguishable 
preparations employed above. Hence \ip) a = \N) a . We can now focus on the 
informational subset containing the states associated with the operators \N) a (N\ 
and \N + l) a (N + 1|. This is a gebit. It is a standard result that, with the 
trace formula, pure states on the Bloch sphere corresponds to superpositions 
a\N) + b\N + 1). Since we established the states on the Bloch sphere exist 
in H631 we have proven that the vector in (|384[) corresponds to a pure state. 
This proves the first part of T I66l bv induction. To establish the second part we 
note that since maximal effects must be represented by positive operators since 
Trd^)a(V'l-Ba) is a probability and so must be positive for all \ip) a - Further, we 
know by PI that there is a unique maximal effect identifying the pure state 
represented by |</2) a (^>| and that this maximal effect does not identify any other 
pure state. The only positive operator doing this when we take the trace is 
| ip) a (ip\. The second part of H66I follows from these facts. 
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12.3 The deterministic effect 



First we will show that 

T67 The deterministic effect, T ai , corresponds to the identity oper- 
ator, j ai , acting on T-L ai . 

The deterministic effect is given by course-graining over the outcomes of any 
measurement (we are implicitly using the uniqueness property in T ll8p . For 
example, we can write 



71=1 

We can chose the effects on the RHS to correspond to the maximal measurement 
used to generate the fiducial set. In this case we get 



This proves U571 

12.4 Operators are physical 

In Sec. 18.21 we considered operator supersets which were subsets of all possible 
operators that might, for example, be induced by taking the subset of operators 
that correspond to operations. Recall that an operator superset is defined to 
be physical if: (1) the operator circuit formed from operators in the superset 
is between and 1; (2) the operator superset contains preparations and effect 
operators equal to all rank one projectors for every type; and (3) the operator 
superset contains result operators corresponding to the identity operator, I 3l , 
for every type. We can now prove that 

T68 All operators corresponding to operations are physical. 

The definition of physical operators is given in Sec. 18.21 The operator superset 
obtained by taking all operators that correspond to operations must be physical 
because: (1) the operator circuit is equal to the probability of a circuit by T[TJ 
(2) all preparation and result operators equal to rank one projectors belong to 
the superset by T I661 (3) The identity operator belongs to the superset by TJ67I 
1(68] then follows from T[5j 

By theorems T[3]and Tj4]from Sec. 18.21 it follows that preparation operators 
must be physical and have trace less than or equal to one, effect operators must 
be positive and be less than or equal to the identity, and the transformations 
associated operations must be completely positive and trace non-increasing. We 
have, then, recovered a substantial part of quantum theory. We need, however, 
to prove a few more results to obtain Axiom 2 (that every complete set of 
physical operators corresponds to a complete set of operations). 




(385) 




(386) 
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12.5 All projection valued measures possible 

We will now prove that 

T69 There exists a maximal measurement, {U a [n] : n = 1 to N a } 
corresponding to any set {|n) a (rc.| : n = 1 to N a } where |n) a is an 
orthnormal basis in % a . Further, the maximal set of distinguishable 
states, {U a [n] : n — 1 to iV a }, corresponds to the set {|n) a (n| : n = 
1 to N 3 }. 

We know from U66I that there is a pure state corresponding to each of the 
projectors |n) a (n|. Consider the projector \N) 3 (N\. This state must belong 
to some maximal distinguishable set of states (by U12|) . We can construct 
a special filter, Fjv, that picks off just this pure state and allows remaining 
states in the maximal distinguishable set to pass through unchanged. From the 
properties of the special filter we now have a maximal effect corresponding to 
|iV) a (7V|. Each of the states associated with |n) a (n| for n = 1 to N — 1 must pass 
through the filter because Tr(|n) a (n|7V) a (7V|) = and so they must belong to 
the informational subset, Sn, associated with Fn- We can iterate this process 
picking off the state corresponding to | N — l) a (N — 1 1 with a special filter Fn-i 
and so on and obtain a maximal effect corresponding to \N — l) a (N — 1|. In 
this way we are able to construct a maximal measurement corresponding to 
{|n) a (n| : n — 1 to iV a } distinguishing the states corresponding to {|n) a (n| : n = 
1 to N a }. This proves TfMl 

12.6 All unitary transformations possible 

We will now show that we can obtain reversible transformations corresponding 
to arbitrary unitary operations. First we prove 

T70 If we have a pure state for a system represented by the operator 
A 3 = |i/)) a (t/'| then, under a reversible transformation on the system, 
\ip) a evolves linearly. 

Recall from Sec. 17.51 (equation (|130[) in particular) that the transformation on a 
state represented by an operator A 31 due to an operation is given by A 3l B a *. 
It follows from T I68l that any operator, B| 2 , must be physical and from T[6]that 
any physical operator corresponds to a completely positive trace non-increasing 
map, $b( - )- Further, we can write any such map in Krauss form form [59j : 




(387) 



where E a ^ [i] are linear operators that act on vectors in T-L a and return vectors 
in "H a having the property 




(388) 
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where 1^ is the identity operator on W 3 . If B 3 *, corresponds to a reversible 
transformation and it is applied to a pure state, then the state afterwards will 
be pure. Hence, 

BllW^ip\=W)^^'\ (389) 

(where, as usual, we are implicitly taking the partial trace over the ai space on 
the LHS). If we represent the transformation in Krauss form then the only way 
this is possible is if i only takes one value (so we drop the sum in (|387[1 ) and 
hence we see that we have that \tp') a — \ip) a which is a linear transformation. 
We now prove 

T71 We can construct a transformation, B, corresponding to an 
arbitrary unitary transformation, Ub, such that 

B a \A ai = U B A a U B (390) 

for all operators, A ai , representing states. 

Any unitary transformation on "H a can be specified by giving a linear map from 
one orthonormal basis set, {|w[n]) a : n = 1 to N a }, to a new one, {|v[".]) a • 
n = 1 to -/V a }. By T I69I we know that the projectors corresponding to any 
orthonormal basis set of T-L 3 form a maximal distinguishable set of states. Hence, 
by T I30l we know that there exists a reversible map that takes the the pure states 
|w[n]) a (u[n]| and maps them to |v[n]) a (i> [n]\. Using T 1701 this means we have a 
linear map, fg, which performs the transformation 

f B \u[n]) a = e i0[n] \v[n]) a for n = 1 to N a (391) 

We must have the exp(i0[n]) phase factors since they cancel when we form the 
projectors |w[n]) a (u[n]|. T I30I does not fix the values of the 8[n] since it only 
guarantees the existance of some reversible transformation mapping between 
these sets of states. The 8[n] terms are important since [as we know from quan- 
tum theory] they give rise to interference when the transformation is applied 
to a general state. However, whatever values 0[n] take, we will show that we 
can construct a transformation that maps maps these phases to zero (note that 
the vectors |v[ n ]) a can already have an arbitrary phase absorbed into them). 
Hence, we can perform an arbitrary unitary. To construct this, first we note 
that we can construct an arbitrary unitary for a gebit (a qubit). We know that, 
in an appropriate representation, pure states are represented by the points on a 
2-sphere. By T I31I we know that there exists a reversible transformation taking 
any point on the sphere to any other point. In other words, the group of re- 
versible transformations must be transitive on the 2-sphere. There is only one 
such group [551 E] which also corresponds to a completely positive map (we 
know by T I68I and T[4] that this must be a completely positive map) and this is 
5*0(3) (i.e. the group of orthogonal transformations). Note that we need 50(3) 
rather than 0(3) because because the map must be completely positive. It is 
well known that special orthogonal transformations on the Bloch sphere corre- 
spond to unitary transformations in the Hilbert space for a qubit. Hence, we 
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can obtain an arbitrary unitary for a qubit. One particular such transformation 
is the phase gate: 

|0) b ^|0) b |l) b ^e^|l) b (392) 

(this is a rotation in about the z axis by an angle cf>). If this is applied to N a 
qubits then we have 

|100...Q) ->e i0 M|lOO...O) 
|010...0) e^ [2] |010...0) 

I001...0) -> e^ [3] |001...0) (393) 

I000...1) -> e^ [Aral |000...1) 

where 4>[n] is the phase applied to the nth qubit. Note that we are implicitly 
invoking the fact that it follows from correspondence that a product preparation 
of qubits corresponds to a product of preparation operators. For example, 

|1)(1|®|0)(0|®|0)(0|®.--®|0)(0| := |100...0)(100...0| (394) 

In (|393[) we have just considered the cases where one qubit is prepared in the 1 
state and the remainder are prepared in the states. Of course there are other 
terms such as 1 110 ... 1) that accumulate a more complicated phase which we 
will not make particular use of. There are N 3 qubits and, hence, a total of 2 N ' 
states in the maximal distinguishable set associated with these qubits. We are 
particularly interested in the N a of these apearing above, namely { 1 100 . . .0), 
|010 ... 0), |001 ... 0), ... ,|000 ... 1)} which we will label by m = 1 to iV a and 
represent as |m) q . We will label the |000...0) state by |0) q . The remaining 
states in the computational basis for the qubits can be labled by m = N 3 + 1 to 
2 a — 1. Here q stands for the system type comprised of N a qubits. We will show 
that we can perform an arbitrary unitary on a system of type a by applying the 



140 



transformation 



Q 



VI] 



[1] 



(395) 



The unlabeled wires are qubits. Applying P4', we choose the transformation 
P to implement a permutation, np, with respect to the maximal set of distin- 
guishable states formed by |u[rt]) a |r7i) q where 



7Tp 



nm f-> ran for n, m = 1 to iV a 
other nm not permuted 



(396) 



This basically swaps the incoming state onto the space spanned by the first N a 
states of the qubits. Applying P4', we choose Q to implement a permutation, 
7tq, of the maximal distinguishable set of states formed by |v[n]) a |m) q where 



nm <-¥ mn for n, m — 1 to N a 
^ ' other mn not permuted 



(397) 



This basically swaps the state back onto a system of type a. The phase rotations, 
cf>[n], do not effect the distinguishable set of states |m)b(m| since the phase 
cancels. Further, the phase rotations are all reversible transformations. Hence 
we can apply exactly the same reasoning as in proving T30I (with a slight but 
obvious elaboration to deal with the reversible transformation due to phase 
gates) to prove the transformation in (|395|) is reversible and maps the states 
{|u[n]) a (u[n]| : n = 1 to N a }, to the states, {|w[n]) a (?;[n]| : n = 1 to N a }. Since 
the transformation is reversible it follows from T I70I that it corresponds to the 
linear evolution in H 3 given in (|391j) . We can chose the <j>[n] in (J395J) to cancel 
the 6[n] and hence achieve an arbitrary unitary transformation. This proves 

THE 



12.7 Proving Axiom 2 

We now prove 
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T72 For any complete set of physical operators, {B%* [I] : I = 1 to L}, 
a corresponding complete set of operations {B^[/] : I = 1 to L} are 
given by 



/ T 1 



Q 



V[l] 



1] 



(398) 



1 



where P and Q are appropriately chosen reversible permutation trans- 
formations, {4>[n\ : n — 1 to N a } are appropriately chosen phases, 
c and d are ancillary systems having appropriate N c and JVd, and 
yb 2 c 3 d 4 ^j j g an appropriately chosen preparation (for a pure state). 
The unlabeled wires represent qubits. T is the deterministic effect. 

We know from T[6]that a complete set of physical operators, {B a * [I] : I = 1 1 L}, 
can be associated with a set of superoperators, {$#(•) ■ I = 1 to L} that are 
completely positive and whose sum is trace preserving. We can write these 
superoperators in Krauss form [59] : 



B' B (i ai ) = B^[l]A*i = 5^^[ii]i»^[K]t 



(399) 



where [li] is any set of operators that act on vectors in H a to return vectors 
in Hb having the property 



(400) 



where I** is the identity operator on H a . We define 

kW) b2C3d4 :=^^MI«N) ai |0 C3 N) d4 



(401) 
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We see that these states are orthonormal: 

(v[n']\v[n}) = J2(u[n']\E%[ltfE%[li]\uln}) = 6 n , n (402) 
u 

where we have used (|400[) . We can complete this set of N a orthonormal states 
for bed into an orthonormal basis set, {|i>[rt]) bcd : n = 1 to -ZVt, c d}, by adding a 
further A^cd — N a basis states (any set that gives a basis will do). By 1 1691 we 
know that projectors onto these states comprise a maximal distiguishable set 
of states. We will use the same notation for the N a qubits as in the proof of 
1 1711 By P4' we can choose the transformation P to be a reversible permutation 
that permutes the maximal distinguishable set of states corresponding to the 
operators |u[n]) a (u[n]| <g) |m) q (TO| according to the permutation 

/ nm f-> mn for n, m = 1 to N a 
7Fp I other nm not permuted 

This basically swaps the state of a onto the qubits. Since we have a reversible 
transformation we have, by T I701 

|u[n]) a |l) q -> cxp^ P [n]|u[l]) a |n) q (404) 

after the P transformation. Next, the a system is detected with certainty at 
the U a [l] effect (as its state is |u[l]) a (u[l]|). If we take the circuit trace we can 
eliminate the space associated with a. The qubits each pass through a phase 
gate and the state becomes exp i(8 p[n]+ <fi[n])\n) q (see the proof of T I7ip . Hence, 
we have 

expi((9p + 0)Hl]) bcd |n) q (405) 

impinging on the transformation Q. Invoking P4' we can choose the transfor- 
mation Q to be a reversible permutation that permutes the maximal distin- 
guishable set of states associated with |u[n]) bcd (ti[n]| ® |m) q (m| according to the 
permutation 

( nm f-> mn for n, m = 1 to iV a 
^ I other nm not permuted 

The state after this transformation is, hence, 

expi(6 P [n} + 0[n] + 9 Q [n})\v[n}) bcd \l)^ (407) 

Since the outgoing state of the qubits is that associated with 1 100 . . . 0) , the 
effects for the qubits after Q all fire with certainty. If we put the state in 
projector form we can take the circuit trace we eliminate the q part of the state. 
We are left with the state, expz^p^J+^nJ+^Q^])!!^]) 13 " 1 for bed. We choose 
4>[n] = —(6p[n] + 0q[it]). This means that if we send |it[n]) a in, we get |i>[rt]) bcd 
out for bed. That is, 

\u[n}r ^^E^[il]\u[nnr\i)^ (408) 

il 
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The transformation up to this stage is reversible by the same reasoning as we 
used in proving U30I (we can build a transformation that does each of the steps 
so far discussed in reverse). Hence by T I70I 

|V) ai := E c >M) ai ^E< 2 »> ai IO C3 l*) d4 ( 409 ) 

n il 

A general positive operator can be written as 

^" = !>*> a ^i ( 41 °) 

(for some real coefficients o^) and will, therefore, evolve according to 

A* J2 E^[li]A'E^[l'i^ ® \l) c (l'\ ® (411) 

Finally, we have effects in the c and d paths. The operator associated with 
the effect in path c is, by T I691 equal to |Z) C (Z| . The deterministic effect, Td, is 
assocaited with the identity operator, I c (by T I67p . If we take the circuit trace 
after these effects have been included then we obtain (when we have outcome I) 

i 

which is what we had in (|399[) . Hence the complete set of operations in (|398l) 
can have any corresponding complete set of physical operators with appropriate 
choices of P, Q, c, d, V 6od [l] and {(f>[n]}. This proves H721 

The complete set of operations in T I72I can simulate any complete set of 
operations whether they constitute a set of preparations, a set of transforma- 
tions, or a set of results (a measurement). In the case that we are simulating a 
preparation we should think of a as the trivial system (having N a = 1) and we 
should send in a pure state (that is normalized). In the case we are simulating 
an effect we should, likewise, think of b, as being the trivial system (having 
iVb = 1) and we should place a maximal effect in the output. Interestingly, we 
can also consider the case where both a and b are trivial systems and where we 
send in a pure state and have a maximal effect in the output. In this case, we 
obtain a complete set of circuits. Each circuit will have a probability associated 
with it. H72I applied to this special case proves that we can obtain any set of 
probabilities, p[l], that add up to one. 

Axiom 2 follows from 1 1721 Hence we have derived quantum theory (for 
finite dimensional Hilbert space) from the five postulates Pl-5. Further classical 
probability theory and quantum theory are the only two theories consistent with 
PI, P2, P3, P4', and P5. 
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Part V 

Discussion 

13 The nature of the reconstruction 

The reconstruction given in this paper is in the context of an operational frame- 
work. This raises questions about the role of operationalism in fundamental 
physics which we will discuss below (in Sec. ITS)) . Many other reconstructions 
are also cast in operational frameworks (see Sec. Q3]). What is it that distin- 
guishes the reconstruction here? The most notable feature is the special role 
played by maximal sets of distinguishable states. They are mentioned explicitly 
or implicitly (by reference to maximal effects and measurements) in three of 
the five postulates. Concepts that are defined with respect to maximal sets of 
distinguishable states play an important role both in setting up the basic ideas 
(such as filters and systems) and in constructing the proofs in this paper. Such 
maximal sets of distinguishable states are a good concept for a reconstruction 
because they are very basic in our classical conception of the world. Further, 
without distinguishable states in the world we would not be able to do very 
much. For example, this paper could not be written as we would not have an 
alphabet by which to form words. It is possible (even likely) that, in some 
deeper theory, distinguishable states are an approximate concept. Distinguish- 
able states allow us to carry information forward in time. If we have indefinite 
causal structure (as in a theory of quantum gravity) then there will not be a 
fundamental notion of "forward in time" and so we may not have distinguishable 
states at the deepest level. 

14 Previous work on reconstruction 

In this section we will consider the relationship of operational postulates pre- 
sented here with some previous papers that take a similar point of view (such 
as adopting tomographic locality as a postulate). 

In 2001 the author provided a set of operationally motivated axioms for 
quantum theory [36]. In modern form (see [40]), these are 

Information Systems having, or constrained to have, a given information car- 
rying capacity have the same properties. 

Information locality Same as P2. 

Tomographic locality Same as P3. 

Continuity There exists a continuous reversible transformation between any 
pair of pure states. 

Simplicity States are specified by the smallest number of probabilities consis- 
tent with the other axioms. 
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Causality was implicit as a background assumption in [36] whereas it follows as 
a theorem ( U18P in the present work. The biggest improvement in the present 
work is that we get rid of the simplicity axiom. We do this using the technique 
discovered by Chiribella, D'Ariano, and Perinotti [H] [T5] in which the output 
of a teleportation transformation is fed into the input. This technique has to be 
adapted to the present work since we start with rather different postulates. The 
information axiom of |36j is a rather strong constraint. In the present work it 
follows very naturally from more basic postulates. The idea is that we can swap 
the state onto a second system by choosing an appropriate permutation of a set 
of distinguishable states for the composite system. The continuity axiom of [55] 
is no longer necessary. The continuity axiom has two parts - a transitivity part 
and a continuity part. The transitivity part in the continuity axiom (that there 
exists a reversible transformation between any pair of pure states) now follows 
as a theorem fT I31|) from PI, P2, P3 and P4'. The continuity property of 
this axiom does not follow at a low level in the reconstruction. If we use P4 
(so we assume compound permutatability) then we immediately deduce that 
there is at least one pure state "between" any pair of distinguishable states for 
a gebit. This takes us a little way in the direction of the continuity property. 
It is striking that this is enough to construct the continuum of pure states (as 
done finally in H66p . 

In 2009 Dakic and Brukner [T7] attempted a reconstruction from the follow- 
ing axioms: (1) Information, (2) Information locality, (3) Tomographic locality, 
(4) Transitivity (that there exists a reversible transformation between any pair 
of pure states), and (5) Gebit state spectrality (any state for a gebit can be 
written as a convex combination of a pair of maximally distinguishable states). 
Causality is taken as a background assumption. Information locality is not ex- 
plicitly stated as an axiom but is explicitly used in the reconstruction. They 
claim that the only two theories consistent with these axioms are classical prob- 
ability theory and quantum theory and suggest that if transitivity is replaced 
by the continuity axiom then quantum theory is singled out. With this substi- 
tution the axioms of Dakic and Brukner are the same those of [55J except that 
gebit state spectality replaces simplicity. The authors show how it follows that 
all points on the hypersphere must correspond to pure states. First they use 
transitivity to to show that pure states must correspond to points on a hyper- 
sphere. It then follows almost immediately from gebit spectrality that all points 
on this hypersphere correspond to pure states. This is an important step in [17] 
on route to getting rid of the simplicity axiom. To actually get rid of the need 
for simplicity axiom (i.e. to show that the hypersphere is a regular 2-sphere) 
Dakic and Brukner employ a quite remarkable technique involving two gebits. 
There is a significant technical difficulty with the proof provided in their paper 
as the authors assume that any group of transformations which are transitive 
on a (d — l)-sphere contains, at least, SO(d). This is not true [55] [TT]. There 
are counterexamples for an infinite number of cases with odd (d — 1). There is 
only one counterexample for the case of even (d— 1). For (of — 1) = 6 the excep- 
tional Lie group Gi (which is a proper subgroup of SO (7)) is transitive on the 
sphere. Now K — N r , N — 2 (for a gebit), and d = K — 1 (the minus one is for 
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normalisation of states) . Hence we actually only need to consider even spheres 
of dimension 2 r — 2. Unfortunately the 6-sphere is one such case. Hence there 
exists one pertinent counterexample relevant to Dakic and Brukner's work that 
needs to be addressed. Work by Masanes and Miiller [55J suggests this coun- 
terexample can be eliminated. Ignoring this technical gap in the proof of [T?] . 
the main advantage of the present work over the work of Dakic and Brukner is 
that deeper reasons are given for all the points on the hypersphere representing 
pure states. Dakic and Brukner use gebit state spectrality to do this. It is 
better if axioms apply to all types of system rather than being restricted to spe- 
cial cases (such as gebits) . This is particularly true in quantum theory because 
qubits are rather special when compared with general quantum systems. A 
more general statement that Dakic and Brukner could have used would simply 
be state spectrality: any state can be written as a convex combination of the 
states in some maximal set of distinguishable states [37 . It follows from the fact 
that density matrices can be diaganolised that this is true in quantum theory. 
This property is rather suprising. It means that any state can be written as 
a convex sum of N a extremal points. If we want to include the unnormalised 
states then we need to include the null state in our convex combinations and 
so we need N a + 1 extremal points. According to Caratheodory's theorem, any 
point in a any convex set of dimension K a can be written as a convex combina- 
tion of K a + 1 extremal points. Since K a = N a in quantum theory, this raises 
the question of why states in quantum theory can be written in terms of far 
fewer extremal states than we would expect in the generic case. This is the sort 
of thing that should be explained by axioms for quantum theory rather than 
assumed. Postulate P5 that filters are non-mixing and non- flattening is, we 
argue here, more natural than state spectrality in view of the special role that 
filters play in allowing us to preparing systems for experiments. 

In 2010 Masanes and Miiller [55] considered the following axioms: (1) In- 
formation, (2) Tomographic locality, (3) Transitivity, and (4) All measurements 
(for a gebit, all mathematically well-defined measurements are allowed by the 
theory). Additionally, they have an axiom that gebits are characterized by a 
finite number of probabilities (in the present work a similar role is played by 
the background assumption Assump2). Causality is taken as a background as- 
sumption. It is shown that classical probability theory and quantum theory are 
the only two theories consistent with these axioms. They suggest substituting 
the continuity axiom for the reversibility axiom to single out quantum theory. 
Note that information locality is not assumed. Rather, Masanes and Miiller 
derive it from their axioms. They use their axiom that all mathematically well- 
defined measurements are allowed to prove that all points on the hypersphere 
for a gebit correspond to pure states. They use group theoretic methods to 
show the hypersphere must be a 2-sphere (employing a trick used by Dakic and 
Brukner along the way). They address the issue of the exception for the 6-sphere 
mentioned above by considering and eliminating the exceptional Lie group G2 
as a possible space for the gebit. Their 4th axiom appears to be related to gebit 
state spectality (as used by Dakic and Brukner) in that it immediately gives 
the property that all points on the hypersphere correspond to pure states in 
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the same environment of assumptions. Masanes and Miiller also show how to 
achieve this step in their proof using perfect distinguishability (this is one of the 
axioms of Chiribella, D'Ariano, and Perinotti given below). 

The present work was, in part, motivated by [T7] and [S^. Dakic and 
Brukner showed that there is a route to getting rid of the simplicity axiom. 
Then Masanes and Miiller dealt with the technical issue mentioned above. Ad- 
ditionally, they showed that information locality follows from other axioms. 
This suggests the following thought: if we turn this round, maybe information 
locality can be used to derive some of those other axioms. Indeed, it turns out 
that information locality plays a central role in deriving transitivity fT I31l) and 
the fact that systems having the same N 3 are equivalent (the information axiom 
above). Information locality is a far more natural assumption than transitivity 
and the information axiom. 

In 2006 D'Ariano initiated his own research program aimed at reconstructing 
quantum theory from operational axioms (see [T8] and references therein). This 
program culminated in 2010 with the beautiful work of Chiribella, D'Ariano, 
and Perinotti (CDP) [T3] (see also [2]) who give the following axioms: 

Causality The probability of preparations is independent of the choice of ob- 
servations. 

Perfect distinguishability Every state that is not completely mixed can be 
perfectly distinguished from some other state. 

Ideal compression For every state there exists an ideal compression scheme. 

Local distinguishability If two bipartite states are different, then they give 
different probabilities for at least one product experiment. 

Pure conditioning If a bipartite system is in a pure state, then each outcome 
of an atomic measurement on one side induces a pure state on the other. 

Purification Every state has a purification. For fixed purifying system, every 
two purifications of the same state are connected by a reversible transfor- 
mation on the purifying system. 

The local distinguishability axiom is identical to the assumption of tomographic 
locality. CDP explicitly state causality as one of their axioms. They regard 
the first five axioms as being standard in that they define a broad class of 
information processing theories. The purification postulate is then regarded as 
the assumption that singles out quantum theory within this broad class. (We 
could take a similar attitude to the five postulates presented in this paper. PI, 
P2, P3, and P4' define a broad class of physical theories developed in Sec. [10] 
and P5 singles out classical probability theory and quantum theory.) A longer 
version of the third axiom is provided "Ideal compression axiom: every source 
of information can be encoded in a suitable physical system in a lossless and 
maximally efficient fashion. Here lossless means that the information can be 
decoded without error and maximally efficient means that every state of the 
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encoding system represents a state in the information source." CDP arrived 
independently at a similar derivation of information locality (P3 in the present 
work) to Masanes and Muller. CDP use avoid the need for a simplicity axiom by 
a novel technique involving teleportation. This technique has has been adopted 
in the present work. 

15 Infinite dimensional Hilbert spaces? 

Our objective here was simply to deal with finite dimensional Hilbert spaces. 
Many applications of quantum theory make use of an infinite dimensional Hilbert 
space. The postulates given here make sense when N a is countably infinite. 
Hence, the postulates are applicable when we have countably infinite Hilbert 
space dimension. Further, in any restriction of such a theory to the finite iV a 
case we would obtain finite dimensional quantum theory from the postulates. 
However, the following questions are open: (a) do postulates Pl-5 give a unique 
theory in the case where N 3 can be countably infinite and, if so, (b) is this 
this theory standard quantum theory for countably infinite dimensional Hilbert 
space? Experimentally we would never be able to distinguish between the finite 
and infinite dimensional cases so the interest in answering these questions has 
more to do with which theories we can formulate with respect to these postu- 
lates (in particular, it is possible to impose various continuous symmetries in the 
case of infinite dimensional Hilbert spaces that cannot be easily accommodated 
in finite dimensional Hilbert spaces). 

16 Quantum field theory 

The duotensor framework approach to quantum theory outlined in Part IIIII 
may offer a route to reformulating quantum field theory. In fact, first we could 
consider the problem of formulating probabilistic field theories (for continuous 
fields) in general. The approach in this paper was for finite systems only. By 
this we mean that (i) any operation or fragment has a finite number of inputs 
and outputs, and (ii) each system type is associated with a finite K a . To do field 
theory for continuous fields we would need to relax these requirements. One pos- 
sible way to proceed is the following. Work with a fixed Minkowski background. 
Each fragment would be associated with a spacetime region having a boundary. 
Fragments have settings and outcome sets. The setting might be imposed by 
some classical apparatus that is part of the experiment. The outcomes would 
be read off detectors or other output devices. We are able to wire together 
two fragments if some part of their boundaries fit together (this may require a 
boost). We can consider infinitesimal areas on the boundary. Associated with 
each infinitesimal area with outward normal pointing to the future would be an 
output. Associated with each infinitesimal area with outward normal pointing 
to the past would be an input. Associated with any part of the boundary with 
normal pointing in a spacelike direction would be both an input and an output. 
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The type associated with the input and output would be determined by (a) the 
type of field and (b) the invariant area of the infinitesimal. Under system com- 
position the areas would add. By imposing full decomposability of operations 
we could set up the duotensor framework so long as we are able to consistently 
replace sums with integrals as required. If this worked it would allow us to 
do both classical field theory (for a probabilistic version of electromagnetism 
for example) and quantum field theory. To do quantum field theory we could 
use the operator duotensor approach. Associated with each region of spacetime 
having a specified setting and outcome set would be a positive operator. These 
operators could be combined by the circuit trace operation to form an operator 
for a composite region. 

We could not do general relativity this way since we have assumed a fixed 
background metric. 

This approach to field theory would enforce a different attitudes than nor- 
mally taken by field theorists in two respects. First, the need to think about set- 
tings and outcomes makes this an operational rather than ontological approach. 
Second, the approach is intrinsically probabilistic. In the case of quantum the- 
ory it is fundamentally based on objects (operators) which are generalizations 
of density matrices and completely positive maps rather than pure states and 
unitary evolution. In particular, the full decomposability property of operators 
makes most sense in this setting. 

17 Quantum Gravity 

When Newton accounted for Kepler's laws of plannetary motion in terms of his 
three laws of motion and his universal law of gravitation he actually did more. 
He also accounted for the motion of the bodies in arbitrary gravitational systems. 
By finding a deeper explanation of the adhoc laws of Kepler he was able to go 
beyond Kepler's physics to new physics. Einstein's derivation of the Lorentz 
transformations from two simple axioms paved the way for the development 
of general relativity (by Einstein), and once again this approach led to new 
physics. The great open problem in fundamental physics today is the problem 
of quantum gravity. This is to find a theory that reduces, in appropriate limits, 
to the physics of quantum theory on the one hand, and the physics of general 
relativity on the other. The new theory may be as different mathematically 
from either quantum theory or general relativity as these theories are from the 
physics that preceded them. To obtain such a theory it is most likely that 
we need to understand the present theories at a deep conceptual level. The 
approach of reformulating them in mathematical and operational terms is likely 
to be helpful here. 

General relativity and quantum theory are both conservative and radical 
compared with the physics that preceded them, but in complementary respects. 
General relativity is conservative in that it is deterministic. It is radical in that 
it has non-fixed causal structure. Quantum theory is conservative in that it has 
fixed causal structure. It is radical in that it is intrinsically probabilistic (it 
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cannot be formulated in its standard form without resort to probabilities). It 
seems likely that a theory of quantum gravity will inherit the radical features 
of both theories. In fact, we can expect it to be a little more radical still. In 
general relativity causal structure is non-fixed. However, once we have a solu- 
tion to the field equations, the metric is everywhere given. Hence the causal 
structure is definite. However, in quantum theory, any quantity that can vary 
is subject to quantum superposition. This leads to a "no-matter-of-factness" or 
"fundamental indefiniteness" about the value of the quantity. While we may 
not expect the mathematics that leads to linear superpositions to survive in 
quantum gravity (the theory could be formulated in terms of very different 
mathematics) it is likely that the qualitative feature of "no-matter-of-factness" 
will survive. Since causal structure varies in general relativity this suggests that 
we will have fundamentally indefinite causal structure in quantum gravity. If so 
then there will be no-matter-of-the-fact about whether a particular interval is 
spacelike or timelike. The circuit model considered here has wires. The obvious 
interpretation of these wires is that they allow systems to pass from one appa- 
ratus use to another. In other words, they are the circuit analogues of timelike 
intervals. This interpretation is enforced by the property of causality. All this 
suggests that we need to work out how to formulate operational theories with- 
out using the notion of timelike intervals at a fundamental level. The causaloid 
framework [3SJ [32] is a preliminary effort in this direction. 

In an operational approach to physics we need to start by considering how we 
go about describing the world operationally. Typically this consists of consider- 
ing small parts of the world and specifying different ways in which they can be 
connected up. The operational framework established in Part [H] is, most likely, 
insufficient for the task of accommodating quantum gravity for two reasons. 
First, it started with the idea of aligning the apertures on apparatus uses (such 
that we can imagine systems passing through). However, at the operational level 
we can imagine experiments that cannot be described in these terms. Indeed, 
for the reasons given above, this aspect of the circuit model may be particu- 
larly problematic for quantum gravity. Second, the circuit model deals with the 
case where we calculate the probability for outcomes on the operations given 
a fixed wiring. In general relativity, however, we are interested in predicting 
coincidences between bodies. In some probabilistic version of general relativity 
(where we have probabilistic ignorance as to the value of observable quantities) 
we would be interested in calculating probabilities for different configurations 
of coincidences. Graphically this would correspond to calculating probabilities 
for different graphs. Preliminary ideas in this direction were outlined in [42) . 

18 Operational methodology and ontology 

Our ultimate objective in fundamental physics must be to gain the deepest un- 
derstanding of the world that is possible. This should give an account of what 
is real at the deepest level. That is, it should provide an ontology. We have 
adopted an operational approach in this paper. It might be thought that this is 
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inconsistent with an attempt to discover the correct ontology of the world. This 
might be true if we were pursuing operationalism as a fundamental philosophy 
(in which it is asserted that there is no reality beyond instrument settings and 
readings as described at the macroscopic level). However, we do not need to take 
this attitude. Rather we can view the approach taken in this paper as an op- 
erational methodology aimed at gaining a deeper insight into certain structural 
properties of quantum theory. The point of adopting a certain methodology 
is that it may help us along the road in constructing the next fundamental 
physical theory (such as a theory of quantum gravity). Operationalism as a 
methodology played an essential role in Einstein's approach to special relativ- 
ity. By thinking in operational terms, Einstein was able to see that the idea of 
absolute simultaneity need not be a feature of the fundamental ontology. This 
was a pretty dramatic success for the operational methodology. It is difficult to 
imagine how he could have had this insight had he not thought in operational 
terms (about how to synchronize distant clocks and so on). Likewise, we can 
hope that the operational methodology will help us to divest ourselves of unnec- 
essary ontological notions we currently take for granted that should play no role 
in quantum theory and, possibly, beyond. An operational methodology enables 
us to proceed in a conceptual manner in the absence of a deeper fundamental 
picture of the world. It is better that physics is driven by conceptual ideas than 
purely mathematical ones. In particular, if we can formulate some essential idea 
in operational terms then it is more mobile as we move between mathematical 
frameworks. 

It is, in some respects, deeply shocking how successful operationalism is for 
the purposes of reconstructing quantum theory. The five postulates given in 
this paper can all be understood in operational terms. The entire framework 
of quantum theory can be derived from operational ideas. Any ontology must 
account for the success of operationalism. Of course, there are things that 
appear in applications of quantum theory that have not been derived such as 
particular Hamiltonians along with the constants that appear in them. But 
surely quantum theory itself is more fundamental than the values of particular 
constants that appear in various applications. 

While our ultimate aim ought to be to come up with an ontology, we should 
perhaps slightly temper rush to get there. It is likely that quantum gravity (or 
whatever more fundamental theory supersedes quantum theory) will look quite 
different from quantum theory (as suggested in Sec.[H]). In such a case there is a 
danger that the exercise of finding the best ontology for quantum theory will be 
entirely academic. It could turn out that none of this ontology passes over to the 
more fundamental theory. Then the world would not actually be as suggested 
by this ontology. On the other hand, it is possible that that the path to the 
next fundamental theory will be via an incorrect ontological understanding of 
the world. Indeed, there are many examples in the history of physics where 
something like this has happened. In other words, we might usefully pursue 
an ontological methodology along side, or instead of, an operational one. Of 
course, most physicists are not so dispassionate. In the end we are driven to 
search for what we hope will turn out to be the correct ontology of the world. 
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After all, it is the desire to understand what reality is like that burns deepest 
in the soul of any true physicist. 
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Postlude: a conjecture 



We conjecture that P5 can be replaced with the following postulate 

P5' Filters are non-mixing. 

To motivate this conjecture, consider the way in which the non-flattening as- 
sumption for filters was used in the reconstruction. In Sec. 111.31 we considered 
a set of states for a getrit that had support on a particular gebit space (call this 
Vi). We then sent them through a filter acting on the getrit that filtered down 
to a different gebit space (call this V2). It was clear that any state in the original 
gebit space would pass through the filter with some probability and that the 
smallest system that would support the outgoing states was a gebit (associated 
with V2). Assume all the incoming states lie on the surface of the convex cone 
of states associated with the incoming gebit (this means they are proportional 
to pure states). If the incoming set of states is non-flat for V\ then they would 
span space of this convex cone. The outgoing set of states must lie on the sur- 
face of the cone associated with the outgoing gebit as we are still assuming that 
filters are non-mixing. However, if the filtering transformation were flattening 
in this case then these outgoing states would not span the space of states of 
this cone - they would be flattened into a lower dimensional space. We know 
from T I37I that the filtering transformation is a projection transformation into 
V2. It is very difficult to take a set of points on the surface a cone and flatten 
them into a lower dimensional space by a projective map while keeping them on 
the surface of the cone. Certainly, if all points on the cone are in our set then 
the can be no such transformation. However, we were using the no flattening 
assumption exactly to show that there must be all points on the surface of the 
cone (which has a hypersphere base) so this does not help us prove the conjec- 
ture. It is possible to imagine sets of points on the initial cone that could be 
flattened. One strategy to prove the conjecture would be to show that it follows 
from the postulates that the set of points representing states on the cone is such 
that it cannot be flattened. 

Non-flattening property of filters was also used in proving T I56I There we 
also had an input gebit space and an output gebit space so the above strategy 
should cover this case also. 

Even if the conjecture is proven, the non-flattening property remains a deeply 
interesting property of quantum theory and, in this paper, it drives two key parts 
of the reconstruction: showing that all points on the hypersphere correspond 
to states and showing that we can prepare any state with a teleportation-type 
transformation. These two proofs are important in showing that K a = N a . 
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Appendices 



A Filters non-flattening in quantum theory 

In this appendix we prove that filters are non-flattening in quantum theory. 
In fact we will prove that all non-mixing transformations are non-flattening in 
quantum theory. First, though, consider filters. Assume we have a filter which 
acts on a Hilbert space V. a . Consider states corresponding to density operators 
acting on "H a . Assume we have a set of such states which have support on sub- 
space, Hb (and there exists no smaller subspace supporting these states). Such 
a set is non-flat if it spans the space of positive operators acting on Hb (because 
we can have a subset of outcomes for a maximal measurement associated with 
this subspace). Now consider sending this set through a filter, F (this could 
be any filter). Let the set of states which emerges from the filter have support 
on the subspace W. c (where there exists no smaller subspace supporting these 
states). This set of output states is non-flat if it spans the space of positive 
operators acting on H c (because we can have a filter that projects onto this 
subspace). We will say the transformation is non- flattening if a non-flat set of 
input states gives rise to a non-flat set of output states. In quantum theory the 
map on density operators due to any transformation is linear. By linearity, if a 
transformation is non-flattening for some (spanning) input set having support 
on a given "H b , then it is non- flattening for any other (spanning) input set hav- 
ing support on lib- Hence, just need to prove that the filter is non-flattening 
for some non-flat input set for every Hb- In fact, it is sufficient to prove that 
there exists some set of input states having support on Hb which give rise to a 
non-flat set of output states having support on H c (where there exists no smaller 
subspace supporting these output states). If this is the case then we can, clearly, 
complete the input set into a non-flat set without effecting the property that 
the output set is non-flat also. 

Now consider sending a non-flat set of states having support on Hb onto 
the filter. We will assume that, after the filter, the smallest subspace that the 
states have support in is of dimension A c . We will refer to this as system c. Let 
the Hilbert space for c be spanned by the orthonormal set {\n) : n = 1 to A c }. 
Hence, 

Filters are non-mixing. If we send in a state we get out a state F\tp) 
(where F is the projector associated with the filter). Since the output states have 
support on W c there must exist A c linearly independent pure states, {\u' n ) : n = 
1 to Ac}, in the output set (linearly independent when represented as vectors 
in H c ) since this is what it means for c to be the smallest system. Let the input 
state that leads to \u' n ) be \u n ). Define 

\u mnx ) = -j={\u m ) + \u n )) and \u mny ) = -j=(\u m ) + i\u n )) (413) 

Consider the states 

A = {cr„ : n = 1 to A c } U {a mnx ,a mn y : m, n = 1 to A c , m < n} (414) 
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If we send the set of states A onto the filter then we will get the states 

A' = K : n = 1 to N c } U {a' mnx , a' mny : to, n = 1 to N Cl to < n} (416) 

out (which are defined as above but with primes on the it's). Now this set of 
states is non flat. To prove this we use the result, due to Duan and Guo [5D], that 
there exists a linear quantum operation which converts any linearly independent 
set of states such as \u' n ) to the same number of orthogonal states y/j^\n). This 
transformation is probabilistic and the success probability is greater than zero 
for each n. Hence, j n > for all n. Since the transformation is linear, the state 
a\u' m ) + b\u' n ) gets converted to a^y^\m) + by/j^n). Hence, the states in A' 
get converted to the states 

A" = {p n : n = 1 to N c } U {p m nx,Pmny ■ m,n = 1 to N c , m < n} (417) 

Here we define 
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(418) 



with 



r Jm\m) + y/j^\n)) and 



\mny) = -j={y 



(419) 

The set of states in A" certainly constitute a spanning set for system c. Since 
it is impossible for a linear transformation to turn a flat set into a non-flat set 
for a system of the same dimension, the states in set A' (immediately after the 
filter) must be a non-flat also. Hence, filters are non-flattening. 

Although we presented this proof for filters, the only property of filters it 
depends on is that they enact a transformation of the form \ip) —> E\%jj) for 
some operator E. This captures the non-mixing aspect of filters. Indeed, in 
quantum theory, all non- mixing transformations are of the form p — > Gp& . 
Hence, in general, in quantum theory all non-mixing transformations are also 
non-flattening. 



B Proof that linearity follows from mixing 

In Sec. l5.4l we introduced the idea of representing a state by a list of probabilities, 
A ai , associated with minimal set of fiducial results, {X^ 1 : oi = 1 to K a }, in 
such a way that 

Prob(A ai B ai ) = A ai B ai (420) 

for any effect, B ai . In particular, we chose K 3 big enough that probability is 
given by a linear function of the probabilities in A ai . We can always do this 
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since we can, if necessary, choose A ai to be a list of probabilities for all results 
(i.e. the set of fiducial results could consist of all results for this type of system). 
We might, however, imagine that there exists a smaller set of fiducial results 
such that the probability Prob(A ai B ai ) is given by a nonlinear function of the 
probabilities in this list. We will call the process of going from a list of all 
probabilities to a minimal fiducial list of probabilities (sufficient to specify the 
state) physical compression. This name is appropriate because the physical laws 
of the theory allow us to decompress this compressed list and calculate a general 
probability. We will prove that 

T73 If we allow arbitrary probabilistic mixtures of preparations 
then (1) linear physical compression is optimal and (2) optimal phys- 
ical compression is necessarily linear. 

This proof was previously given in ([IS]). To prove the first point we observe 
that, under linear compression, there must exist K 3 linearly independent states, 
{^4 ai [/c] : k = 1 to K a }, since otherwise we could compress further. We can 
take an arbitrary probabilistic mixture of these linearly independent states with 
weightings Xf. > where 'Y^k* ^fe — 1- Note we need not have the Afc's sum to 
one because we can include an extra weighting, Ao, for the null state. With this 
extra weighting they would sum to one. The A^'s (for k = 1 to K™ ln ) can all 
be varied independently in the convex sum 

Y, X kA ai [k] (421) 

k 

giving rise to a volume of dimension K a in the state space. Hence, we need K a 
probabilities to specify the state and so linear compression is optimal. To prove 
the second point consider a representing the state by a list of K a probabilities, 
A[k] (k = 1 to K a ), where we do not demand that the general probability is 
given by a linear function. The entries in A[k] correspond to some set of fiducial 
results, Y ai [fc], such that 

A ai := Prob(A ai Y ai [/c]) (422) 

Now, since linear compression is optimal, there must by (1) exist a set of K a 
fiducial results, XjJ 1 , for which the physical compression is linear. Let A ai be 
the list of probabilities with respect to this set of fiducial results. We can write 

A[k] = Prob(A ai Y ai [fc]) = ^ ai F Ql [fc] (423) 

We can think of Y ai [/c] as a matrix with entries labeled by a\k. This matrix 
must be invertible since otherwise we could specify A[k] with fewer than K a 
probabilities. Hence, we can write 

Prob(A ai B ai ) = A^B ai = (Y ai [k])' 1 A[k])B ai (424) 

but this equation is linear A[k\. Hence, optimal compression is linear. 

This technique for proving linearity follows from allowing arbitrary mixtures 
is much simpler than the approach adopted in |36) . 
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C Compactness 



In this appendix we consider the vector formed from fiducial probabilities char- 
acterizing fragments with a given input-output structure. We consider the case 
where only a finite number of fiducial probabilities are required. We show that 
Assump 3 implies that the space of such vectors is compact (i.e. bounded and 
closed). This means that the sets of allowed states, effects, and transformations, 
are compact for the finite dimensional cases we are interested in. It also implies 
that the sets of allowed duotensors associated with fragments having a given 
input-output structure are compact (so long as they are specified by a finite 
number of fiducial probabilities). 

We consider the cases where we can characterize fragments having a given 
input-output structure by a finite set of probabilities defined with respect to 
a finite set of K fiducial fragments, {X[fc] : k = 1 to K}, each of which can 
complete any fragment in having this input-output structure into a circuit (i.e. 
they have the mirror input-output structure). Any two fragments that are 
equivalent have the same set of probabilities 

p A := {Prob(X[fc]A) : k = 1 to K} (425) 

(this is understood to be a vector with the given components). We use slightly 
antiquated notation here - in Part lllll we see that this object can correspond to 
a duotensor with all black dots. If we assume full decomposability (or, equiva- 
lent^, tomographic locality) then K is equal to the product of the K 3 's associ- 
ated with each input and output. In any case, we choose K to be just sufficient 
that we can write 

Prob(AC) = r c • p A (426) 

The argument for being able to do this is the same as in Sec. 15.41 The r vectors 
belong to some set, R, associated with this mirror input-output structure. 

We will now show that it follows from Assump 3 that the set of allowed 
Pa is compact. We know that probabilities are bounded by and 1. Further, 
if we put 5 = a/l where a is a constant and I is a positive integer, then we 
have a convergent series, {A[a/l] : I = 1, 2, . . . }. Since, by Assump 3, the limit 
point exists, the set of allowed pa must be closed with respect an appropriately 
chosen norm. One norm that will do this is just || • || defined by 

IIpIIHpI (427) 

since, if A and B are operationally indiscernible to accuracy S then it follows 
from the linearity of (I426|) that there exists a constant c such that |pa~ Pb| < cS. 
To see this we note that 

|Pa - Pb| = maxx ■ (p A - Pb) where |x| = 1 (428) 

X 

We can expand any unit vector, x, in terms of a (not necessarily orthonormal) 
basis set of vectors chosen from i?,, 

K 

Z = ^2x k r k (429) 
fe=i 
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Then we can write 

K 

x • (pa - Pb) = ^2 x k r k ■ (p A - Pes) (430) 
fe=i 

There must exist some finite a such that a > for all k and for all unit 
vectors x. Then we have 

x • (pa - Pb) < KaS (431) 

Hence, 

|Pa-Pb|<^«<5 (432) 

as required. 

We have shown that the p vectors belong to compact sets. In duotensor 
language this means that the duotensors with all black dots belong to compact 
sets. However, we can use the hopping metric (and its inverse) to change the 
form of the duotensor (so it does not necessarily have all black dots). The 
hopping metric is invertible and hence duotensors of a given form always belong 
to compact sets. 



D Transforming duotensors 

For an object to be a duotensor it must transform appropriately under trans- 
formation of the fiducial preparations and results. We will indicate the original 
fiducial preparations and results by X and the new set by X. Then we can write 
the old in terms of the new. For results we have 

X£ = EH %ll (433) 

where S- 1 is the transformation matrix between fiducial sets of results. For 

ai 

preparations we have 

ai X ai =^ 5l X ai (434) 

where ~J"P is the transformation matrix between fiducial sets of preparations. 
Consider 

K\t = C3d4 A aib2 X^X b h l c ^ di X^ 

= ^Kl, £3 X C3 jX d4 

Clearly 

C3^44 . _ c 3 d 4 A cai pb2 c 3 -r> di-r> ( 

This equation shows how a duotensor transforms if it has only pre-superscripts 
and subscripts. To see how it transforms if we have indices in other positions 
we note 

Prob(A ai B ai ) = A a ^ B ai = B- bi = ££ B ai (436) 
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where we transform the subscript as in (|435[) . This equation must hold for any 
B ai . Hence we must have 

A ai = fikx g ai (437) 



or 



i Sl = A ai £«l (438) 



where £%l is the inverse of £°^ such that 



£% £ a al = 5l\ (439) 

Hence superscripts on duotensors transform with . 
By considering 

Prob(A ai B ai ) = ai A ai B = hl A ~ b B (440) 

and employing similar reasoning to that above, we can easily prove that pre- 
subscripts transform with , this being is the inverse of "JP, i.e. 

= % 6 (441) 

Hence, the transformation rule for a duotensor with indices in all positions is 
illustrated by 



t A t = c iA\ c £ t: t v ( 442 ) 



We see that a duotensor that has only subscripts and superscripts (i.e. is in stan- 
dard form) transforms as a tensor with respect to the transformation matrix for 
results. A duotensor that has only pre-superscripts and pre-subscripts trans- 
forms as a tensor with respect to the transformation matrix for preparations. 
However, a duotensor with indices in all positions behaves like a new object 
that transforms with transformation matrices for the results and the prepara- 
tions. Further, there exists a hopping metric which can take indices from the 
left to the right and vice-versa. The duotensor is a generalization of the idea of 
a tensor. It has particular application to operational probabilistic theories. We 
should note that we have a choice of fiducial results and fiducial preparations 
for each type. In general we do not expect K 3 and to be equal. Hence the 
indices for different types will, in general, run over different numbers of values. 



E Proof that K = N r 

We will prove that if we have a integer valued function, K(N), satisfying (i) 
K(N + 1) > K(N), (ii) K(N a Nb) = K(N A )K(N B ), for N = 1,2,3,... then 
K = N r for r = 1,2,3,.... This was first proven (for the purpose of recon- 
structing quantum theory) in [36] . Here we give the simpler proof in [47] . We 
can expand N as a product of primes. Thus, we write N = Y[i(Pi) ni where pi 
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is the ith prime and rij is the power of this prime in the expansion. It follows 
from (ii) that 

K(N) = Y[[K(p t )r (443) 

i 

We will prove that K(pi) — (pi) r for a fixed value of r that is independent 
of pi. To prove this we will assume the converse and obtain a contradiction. 
Thus, assume that there exist two primes, p and q such that K(p) = p rp and 
K{q) = q rq where r p ^ r q . Let Na — p a and Nb = q b where a and b are positive 
integers. Using (I443P we obtain 

1o % Kb = b ^ lQ gg (444) 
log if a ar q log p 

We also have 

lo g^ = ^2ii (445) 
logiVyi alogp 

The two real numbers logg/logp and (r q /r p )(\ogq/ \ogp) are distinct since we 
assume r p ^ r q . Hence we can choose a value of a/b that is strictly in be- 
tween these two real numbers. If we to this then it follows that the ratios 
log Kb I log Ka and log Nb/ log Na lie on opposite sides of 1. This contradicts 
(i). Hence we must have K{pi) = p\ for all pi. Using (|443[) we immediately 
obtain K = N r . Since K must be an integer for all N, r must take non- negative 
integer values. We cannot have r = by (i). Hence r = 1, 2, 



F The duotenzor drawing package 

All figures in this work were drawn using version 1.1 of the duotenzor pack- 
age |45j . Version 1.1 has additional commands to draw operator boxes. The 
duotenzor drawing package (spelled with a z) is a purpose built package for 
drawing circuits and duotensor diagrams [H] . It consists of about eighty com- 
mands (defined using the LaTeX \newcommand command) that call on the TikZ 
package written by Till Tantau [70] • Here is a simple example. The code on the 
left produces the example on the right. 

\begin{diagram} 
\Dpbox{A}{0,0} 
\Dpbox{B}{2,4} 
\wire{A}{B}{l}{2} 
\end{diagram} 

Here \Qpbox{B}{2,4} puts a box at coordinate (2,4) with the symbol B in 
it. The \wire{A}{B}{l}{2} command draws a wire from output 1 of box A 
to input 2 of box B. A comprehensive tutorial for the package is provided with 
this package [5S] (see also the appendix to [H]). For the drawing the kind of 
circuits used in quantum computing papers the Q-circuit package written by 
Brian Eastin and Steve Flammia [21] (which is powered by the XY-pic LaTeX 
package) is more suitable than the duotenzor package. 
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